Database Design Review
We learned that databases are useful because they store a lot of data for applications, give desirable guarantees for transactions, and are efficient and reliable. Before any of this is possible, we need to carefully design a database.
💾 Data Storage
Databases provide centralized, organized storage for application data
🔒 ACID Guarantees
Transactions ensure atomicity, consistency, isolation, and durability
⚡ Efficiency
Optimized query processing and data retrieval mechanisms
📐 Careful Design
Proper conceptual modeling is the foundation of a successful database
Data Abstraction and Architecture
Preliminaries
Database systems provide an abstract view of the data. The three levels of abstraction are as follows:
Schemas and Instances
📋 Logical Schema
Describes the database design at the logical level (e.g., Student takes Course)
💾 Physical Schema
Describes the database design at the physical storage level
👁️ External Schema (Subschema)
Describes different views of the database for various external users
Schema: Student(RollNo, Name, Address, PhoneNo, DOB, email, CourseNo, DNo)
Instance: The actual rows of data in the Student table at any given time
| Name | Student_number | Class | Major |
|---|---|---|---|
| Smith | 17 | 1 | CS |
| Brown | 8 | 2 | CS |
Data Independence
Types of Data Independence
📊 Logical Data Independence
The capacity to change the conceptual schema without having to change external schemas and their associated application programs.
Example: If user B adds a new column (e.g., salary) to their table, it will not affect the external view for user A, though the internal schema has been changed for both users.
💾 Physical Data Independence
The capacity to change the internal schema without having to change the conceptual schema.
Example: Changing file organization, storage structures, storage devices, or indexing strategy without affecting the conceptual or external schemas.
Logical data independence is more difficult to achieve than physical data independence, since application programs are heavily dependent on the logical structure of the data they access.
Conceptual Design Process
We have our enterprise we intend to model. What are the entities and relationships in this enterprise?
- What information should we store about those entities and relationships?
- What rules do we need to make sure our data is sound? (Integrity constraints)
The encoding in the relational database realm is referred to as an Entity-Relationship (ER) diagram.
ER Components: Entities and Attributes
🔑 Key Attribute
Each entity set has a key attribute that uniquely identifies each entity
📦 Domain
Each attribute has a domain (e.g., float, int, date, string)
🎯 Distinguishable
Each entity can be uniquely identified by its attributes
Types of Attributes
Each entity has a single atomic value for the attribute.
Examples: SSN, Sex, Age
The attribute may be divided into smaller parts.
Examples:
Address(Number, Street, City, Country)
Name(FirstName, MiddleName, LastName)
A set of values for the same entity.
Examples:
{Color} 1...3 for a car
{CollegeDegree} for a person
Value stored directly in the database.
Example: BirthDate
Value can be obtained from other attributes.
Example: Age (derived from BirthDate and current date)
Composite and multi-valued attributes nested arbitrarily.
Example: {PreviousDegrees(College, Year, Degree, Field)}
Complex attributes are very rare and not a good design technique. They should be avoided in most cases.
Keys in ER Diagrams
- EMPLOYEE: SSN (Social Security Number)
- STUDENT: Student ID
- CAR: VehicleIdentificationNumber (VIN)
Key Characteristics
🔗 Composite Keys
A key attribute may be composite
VehicleTagNumber(Number, State) - license plate
🎯 Multiple Keys
An entity type may have more than one key
CAR: VIN or VehicleTagNumber
📝 Notation
Each key is underlined in ER diagrams
All candidate keys are underlined
ER Diagram Notation
In ER diagrams, we use specific symbols to represent different components:
Represented by a rectangle. Contains the entity type name.
Represented by a double rectangle. Does not have a key attribute.
Represented by a diamond. Contains the relationship name.
Represented by an oval connected to its entity or relationship.
Represented by an oval with the attribute name underlined.
Represented by a double oval.
Represented by a dashed oval.
Shown as an oval connected to other ovals representing component attributes.
Relationships in ER Diagrams
- EMPLOYEE John Smith works on the ProductX PROJECT
- EMPLOYEE Franklin Wong manages the Research DEPARTMENT
Relationship Degree (Arity)
A relationship type between the same entity type in distinct roles.
Example: EMPLOYEE supervises EMPLOYEE
A relationship between two different entity types.
Examples: EMPLOYEE works for DEPARTMENT, STUDENT takes COURSE
A relationship among three entity types.
Example: SUPPLIER supplies PART to PROJECT
A relationship among N entity types.
N > 3 relationships are rare in practice
Relationship Cardinality Constraints
Types of Cardinality
An entity in A is associated with at most one entity in B, and vice versa.
Example: An employee manages at most one department. A department is managed by at most one employee.
An entity in A is associated with any number of entities in B. An entity in B is associated with at most one entity in A.
Example: A department has many employees. An employee works for one department.
An entity in A is associated with at most one entity in B. An entity in B can be associated with any number of entities in A.
Example: Many employees work for one department.
An entity in A is associated with any number of entities in B, and vice versa.
Example: An employee works on many projects. A project has many employees working on it.
Participation Constraints
Not all entities in the entity set participate in the relationship.
Represented by a single line
Example: Not all employees manage departments
Every entity in the entity set must participate in at least one relationship instance.
Represented by a double line
Example: Every employee must work for a department
(Min, Max) Notation
📏 Default Values
Default: min=0, max=n (no limit)
Must have min ≤ max, min ≥ 0, max ≥ 1
🎯 Reading (min, max)
Read the numbers next to an entity type while looking AWAY from that entity
EMPLOYEE (1,1) - WORKS_FOR - (1,N) DEPARTMENT
Each employee works for exactly one department; each department has 1 to N employees
📊 Common Patterns
- (0,1): Optional, at most one
- (1,1): Mandatory, exactly one
- (0,N): Optional, any number
- (1,N): Mandatory, at least one
Weak Entity Types
🔗 Identifying Relationship
A weak entity must participate in an identifying relationship type with an owner entity
🔑 Partial Key
Weak entities have a partial key that is unique only within entities related to the same owner
👥 Identification
Weak entities are identified by combining their partial key with the key of the owner entity
Scenario: DEPENDENT entities and EMPLOYEE entities
- Owner Entity: EMPLOYEE (has SSN as key)
- Weak Entity: DEPENDENT (no key attribute)
- Partial Key: Dependent_name
- Identifying Relationship: DEPENDENT_OF
- Identification: A DEPENDENT is identified by (Employee_SSN, Dependent_name)
Example: "John Smith's dependent Mary" is different from "Alice Brown's dependent Mary"
A weak entity is shown with a double rectangle
The identifying relationship is shown with a double diamond
The partial key is underlined with a dashed line
Attributes of Relationships
Relationship: WORKS_ON (between EMPLOYEE and PROJECT)
Attribute: HoursPerWeek
The value of HoursPerWeek depends on a particular (employee, project) combination. It describes how many hours per week that specific employee works on that specific project.
Most relationship attributes are used with M:N relationships. In 1:N relationships, they can often be transferred to the entity type on the N-side of the relationship.
Example: COMPANY Database
Let's design a database schema for a company based on the following requirements:
- The company is organized into DEPARTMENTs. Each department has a name, number, and an employee who manages it. We keep track of the start date of the department manager. A department may have several locations.
- Each department controls a number of PROJECTs. Each project has a unique name, unique number, and is located at a single location.
- The database stores each EMPLOYEE's social security number, address, salary, sex, and birthdate.
- Each employee works for one department but may work on several projects. The database keeps track of the number of hours per week that an employee works on each project.
- We keep track of the direct supervisor of each employee.
- Each employee may have a number of DEPENDENTs. For each dependent, we keep a record of name, sex, birthdate, and relationship to the employee.
Identified Entity Types
🏢 DEPARTMENT
Attributes: Name, Number, Locations, Manager, Manager_start_date
📋 PROJECT
Attributes: Name, Number, Location, Controlling_department
👤 EMPLOYEE
Attributes: SSN, Name(Fname, Minit, Lname), Address, Salary, Sex, Birth_date
👨👩👧 DEPENDENT
Attributes: Name, Sex, Birth_date, Relationship, Employee
Identified Relationships
Between EMPLOYEE and DEPARTMENT (N:1)
Between EMPLOYEE and DEPARTMENT (1:1)
Attribute: Start_date
Between DEPARTMENT and PROJECT (1:N)
Between EMPLOYEE and PROJECT (M:N)
Attribute: Hours
Between EMPLOYEE and EMPLOYEE (1:N)
Recursive relationship
Between EMPLOYEE and DEPENDENT (1:N)
Identifying relationship
Practice Activity: SBS Bank
Scenario: We have a bank called SBS with the following business requirements:
- The bank has multiple branches. Each branch is located in a specific city and has its own unique name. Every year the branch is assigned a yearly budget and a rating (out of 10).
- A customer is identified by their name and phone number. The bank also keeps track of their address.
- The bank offers two services: accounts and loans to customers. Each account and loan has a unique number and is created and maintained by a single branch.
- Each account is assigned to one or more customers and the balance cannot be negative.
- A loan is always assigned to a single customer, has an interest rate, and a balance that cannot be negative.
Sample Solution
🏦 BRANCH
Attributes: Name (key), City, Budget, Rating
👥 CUSTOMER
Attributes: Name, Phone (key), Address
💰 ACCOUNT
Attributes: ID (key), Balance
💵 LOAN
Attributes: ID (key), Balance, Interest
BRANCH (1) - ACCOUNT (N)
BRANCH (1) - LOAN (N)
CUSTOMER (M) - ACCOUNT (N)
CUSTOMER (1) - LOAN (N)
ER Diagram Design Conventions
📝 Naming
Choose names that convey meanings: nouns for entities and attributes, verbs for relationships
📖 Readability
Design diagrams to be readable from left to right and top to bottom
🔤 Capitalization
Entity and relationship type names in UPPERCASE, attributes Capitalized, role names in lowercase
🎨 Consistency
Use consistent notation and styling throughout the diagram
Chapter Summary
In this chapter, we explored the Entity-Relationship (ER) model for conceptual database design:
- Understood the three levels of data abstraction: physical, logical, and external (views)
- Learned about schemas (database structure) versus instances (actual data)
- Explored data independence: logical and physical
- Defined entities, entity sets, attributes, and their various types
- Understood relationships, relationship types, and relationship degrees
- Learned about cardinality constraints (1:1, 1:N, N:1, M:N)
- Explored participation constraints (total vs. partial)
- Studied weak entities and identifying relationships
- Applied concepts to real-world examples (COMPANY database, SBS Bank)
Key Takeaway:
The ER model provides a high-level conceptual view of the database structure. It is the first step in database design, helping us understand and model the real-world domain before implementing it in a specific DBMS.
Next Steps:
In the following chapters, we will learn how to convert ER diagrams into relational schemas, understand normalization, and write SQL queries to interact with our designed databases.