Optimizing Database Design Through Normal Forms
Question: Is redundancy always bad? Sometimes redundancy is acceptable for performance reasons!
Semantics of relation attributes must be clear
Design schemas that don't suffer from insertion, deletion, and update anomalies
If anomalies exist, document them so applications can handle them properly
Design relations with as few NULL values as possible
Design relations that can be joined without generating spurious (false) tuples
Example: Consider EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Problem: Changing project P1's name from "Billing" to "Customer-Accounting" requires updating ALL 100 employees working on that project!
Impact: Data inconsistency if some updates are missed
Example: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Problems:
Impact: Loss of information about projects/employees
Example: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Problems:
Impact: Unintended loss of important information
A formal tool for analyzing relational schemas that enables detection and description of design problems in precise form.
Definition: A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y
X → Y holds if: Whenever two tuples have the same value for X, they must have the same value for Y
For any two tuples t₁ and t₂: If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]
Key Points:
Social security number determines employee name
Project number determines project name and location
Employee SSN and project number together determine hours worked
If X is a candidate key of R, then X → R
This means a candidate key functionally determines all attributes in the relation
A superkey is a set of attributes S ⊆ R where no two tuples will have the same values for S
Example: {ID}, {ID, First_name, Last_name}, {ID, Order_date}
A key is a minimal superkey - removing any attribute causes it to no longer be a superkey
Example: {ID} or {First_name, Last_name} (if unique)
A relation is in First Normal Form if it does NOT allow:
Each attribute must contain only atomic (indivisible) values
For multivalued attributes:
Alternative approach:
To normalize nested relations:
A relation schema R is in Second Normal Form (2NF) if:
Full FD: Removal of any attribute from the left side breaks the dependency
Full FD - neither SSN → HOURS nor PNUMBER → HOURS holds
Partial dependency - SSN → ENAME holds
A relation violates 2NF when:
Steps:
A relation is in 2NF if every non-prime attribute is NOT partially dependent on any key of R
This accounts for relations with multiple candidate keys
A relation schema R is in Third Normal Form (3NF) if:
Transitive FD: X → Z can be derived from X → Y and Y → Z
Therefore: SSN → DMGR_SSN is transitive
Non-transitive - no intermediate attribute
In X → Y → Z with X as primary key:
A relation R is in 3NF if whenever X → A holds, then either:
A relation is in 3NF if every non-prime attribute:
Note: This definition shows that 3NF implies 2NF
BCNF is a stronger form of 3NF that handles relations with overlapping candidate keys
When a relation has more than one candidate key, anomalies may result even in 3NF
A relation R is in Boyce-Codd Normal Form (BCNF) if:
Whenever X → A holds in R, then X is a superkey of R
In other words:
Every determinant must be a candidate key
Each normal form is strictly stronger than the previous:
Key Point: Relations can be in 3NF but NOT in BCNF!
Recall the general 3NF definition: X → A is okay if either:
BCNF only allows condition (a)
BCNF removes condition (b), making it stricter than 3NF
To decompose R into BCNF:
Important: BCNF decomposition may not preserve all functional dependencies!
A decomposition D = {R₁, R₂} has the lossless join property if either:
Translation: The common attributes must functionally determine one of the relation's unique attributes
Armstrong's Axioms are proven and complete inference rules for deriving functional dependencies
1. Reflexivity: If Y ⊆ X, then X → Y
2. Augmentation: If X → Y, then XZ → YZ
3. Transitivity: If X → Y and Y → Z, then X → Z
Union: If X → Y and X → Z, then X → YZ
Decomposition: If X → YZ, then X → Y and X → Z
These can be proven using Armstrong's Axioms
Prove: If X → Y and X → Z, then X → YZ
Closure of X (denoted X⁺): The set of all attributes functionally determined by X based on F
Key property: A key contains a full set of attributes in its closure
Input: Set of FDs F, and set of attributes X
Output: X⁺ (closure of X)
Relation: SupplierPart(sname, city, status, p#, pname, qty)
Functional Dependencies:
Prove {sname, p#} is a key:
Step 1: Show it's a superkey
Step 2: Show it's minimal
Conclusion: {sname, p#} is a key!