📐 Functional Dependencies & Normalization

Optimizing Database Design Through Normal Forms

📚 Chapter 9
🎯 CS 340
👩‍🎓 Shoug Alomran
⏱️ Complete Guide

📑 Table of Contents

📋

Informal Design Guidelines

🎯 Goals of Database Design
  1. Information Preservation: Maintain all concepts including attributes, entities, relationships, and specializations from conceptual design
  2. Minimum Redundancy: Minimize redundant storage and reduce need for multiple updates

Question: Is redundancy always bad? Sometimes redundancy is acceptable for performance reasons!

Four Key Guidelines

📌 Guideline 1: Clear Semantics

Semantics of relation attributes must be clear

  • Each tuple should represent one entity or relationship instance
  • Don't mix attributes from different entities (EMPLOYEE, DEPARTMENT, PROJECT)
  • Use foreign keys to refer to other entities
  • Keep entity and relationship attributes separate
  • Bottom line: Design schemas that are easy to explain
📌 Guideline 2: Avoid Anomalies

Design schemas that don't suffer from insertion, deletion, and update anomalies

If anomalies exist, document them so applications can handle them properly

📌 Guideline 3: Minimize NULL Values

Design relations with as few NULL values as possible

  • Frequently NULL attributes should be in separate relations
  • NULLs waste storage space
  • NULLs have multiple interpretations (not applicable, unknown, unavailable)
  • Can cause problems with JOINs and understanding attribute meaning
📌 Guideline 4: Avoid Spurious Tuples

Design relations that can be joined without generating spurious (false) tuples

  • Join on attributes that are primary keys and foreign keys
  • Bad designs can generate erroneous results in JOINs
  • Always use lossless joins
💾

Redundancy & Update Anomalies

⚠️ Problems with Redundant Information
  • Wastes storage space
  • Causes update anomalies:
    • Insertion anomalies
    • Deletion anomalies
    • Modification anomalies

The Three Types of Anomalies

🔴 Update (Modification) Anomaly

Example: Consider EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Problem: Changing project P1's name from "Billing" to "Customer-Accounting" requires updating ALL 100 employees working on that project!

Impact: Data inconsistency if some updates are missed

🟠 Insertion Anomaly

Example: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Problems:

  • Cannot insert a project unless an employee is assigned to it
  • Cannot insert an employee unless they're assigned to a project

Impact: Loss of information about projects/employees

🟡 Deletion Anomaly

Example: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Problems:

  • Deleting a project deletes ALL employees working on it
  • Deleting the sole employee on a project deletes the project

Impact: Unintended loss of important information

🔗

Functional Dependencies (FDs)

📘 What are Functional Dependencies?

A formal tool for analyzing relational schemas that enables detection and description of design problems in precise form.

Definition: A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y

🔑 Formal Definition

X → Y holds if: Whenever two tuples have the same value for X, they must have the same value for Y

For any two tuples t₁ and t₂: If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]

X Y

Key Points:

  • FDs are a property of the relation schema, not individual instances
  • FDs are derived from real-world constraints
  • Used to define normal forms

Examples of Functional Dependencies

💡 Common FD Examples
SSN ENAME

Social security number determines employee name

PNUMBER {PNAME, PLOCATION}

Project number determines project name and location

{SSN, PNUMBER} HOURS

Employee SSN and project number together determine hours worked

Types of Functional Dependencies

Type Description Example Full Dependency Removal of any attribute breaks the FD {SSN, PNUMBER} → HOURS Partial Dependency A subset of the left side can determine the right side SSN → ENAME (when key is {SSN, PNUMBER}) Transitive Dependency X → Y and Y → Z, therefore X → Z SSN → DNUMBER → DMGR_SSN

Determining FDs from Instances

⚠️ Important Rules
  • From a relation instance, we can only conclude that an FD may exist
  • We can definitively conclude that certain FDs do NOT exist when tuples violate them
  • Understanding attribute meaning and relationships is essential
  • If X → Y in R, this does NOT mean Y → X in R
🔑 Key Property

If X is a candidate key of R, then X → R

This means a candidate key functionally determines all attributes in the relation

🔑

Keys & Attribute Classifications

📘 Superkey

A superkey is a set of attributes S ⊆ R where no two tuples will have the same values for S

Example: {ID}, {ID, First_name, Last_name}, {ID, Order_date}

🔑 Key (Candidate Key)

A key is a minimal superkey - removing any attribute causes it to no longer be a superkey

Example: {ID} or {First_name, Last_name} (if unique)

🎯 Key Terminology
  • Candidate Key: Any valid key for the relation
  • Primary Key: One candidate key chosen as the main identifier
  • Secondary Keys: Other candidate keys not chosen as primary
  • Prime Attribute: Member of at least one candidate key
  • Non-prime Attribute: NOT a member of any candidate key
1️⃣

First Normal Form (1NF)

📘 Definition of 1NF

A relation is in First Normal Form if it does NOT allow:

  • Composite attributes (attributes with sub-parts)
  • Multivalued attributes (attributes with multiple values)
  • Nested relations (relations within relations)
  • Repeated groups
✅ 1NF Requirements

Each attribute must contain only atomic (indivisible) values

  • Considered part of the definition of a relation
  • Most RDBMSs only allow 1NF relations
  • All values must be from the same domain

Normalizing to 1NF

💡 Method 1: Separate Relation

For multivalued attributes:

  1. Remove attributes that violate 1NF
  2. Place them in a separate relation with the primary key
  3. Primary key of new relation = combination of original PK + multivalued attribute
💡 Method 2: Expand Key (with redundancy)

Alternative approach:

  1. Expand the key to include the multivalued attribute
  2. Create separate tuple for each value
  3. Warning: Introduces redundancy!
💡 Nested Relations

To normalize nested relations:

  1. Remove the nested relation attributes
  2. Create new relation with original PK + nested relation's candidate key
2️⃣

Second Normal Form (2NF)

📘 Definition of 2NF

A relation schema R is in Second Normal Form (2NF) if:

  1. It is in 1NF, AND
  2. Every non-prime attribute is fully functionally dependent on the primary key
🔑 Key Concept: Full Functional Dependency

Full FD: Removal of any attribute from the left side breaks the dependency

✅ {SSN, PNUMBER} → HOURS

Full FD - neither SSN → HOURS nor PNUMBER → HOURS holds

❌ {SSN, PNUMBER} → ENAME

Partial dependency - SSN → ENAME holds

⚠️ 2NF Violation

A relation violates 2NF when:

  • A non-prime attribute depends on only part of a composite primary key
  • This is called a partial dependency

Normalizing to 2NF

💡 2NF Normalization Process

Steps:

  1. Identify all partial dependencies
  2. For each partial dependency X → Y:
    • Create new relation with X and Y
    • Remove Y from original relation
    • Keep X in original relation as foreign key
📋 General Definition (Multiple Keys)

A relation is in 2NF if every non-prime attribute is NOT partially dependent on any key of R

This accounts for relations with multiple candidate keys

3️⃣

Third Normal Form (3NF)

📘 Definition of 3NF (Simple)

A relation schema R is in Third Normal Form (3NF) if:

  1. It is in 2NF, AND
  2. No non-prime attribute is transitively dependent on the primary key
🔑 Transitive Dependency

Transitive FD: X → Z can be derived from X → Y and Y → Z

SSN → DNUMBER → DMGR_SSN

Therefore: SSN → DMGR_SSN is transitive

✅ SSN → ENAME

Non-transitive - no intermediate attribute

⚠️ Important Note

In X → Y → Z with X as primary key:

  • Only a problem if Y is NOT a candidate key
  • If Y IS a candidate key, no problem exists

General 3NF Definition (Multiple Keys)

🎯 Formal 3NF Definition

A relation R is in 3NF if whenever X → A holds, then either:

  1. X is a superkey of R, OR
  2. A is a prime attribute of R
📋 Alternative 3NF Definition

A relation is in 3NF if every non-prime attribute:

  1. Is fully functionally dependent on every key, AND
  2. Is non-transitively dependent on every key

Note: This definition shows that 3NF implies 2NF

The Mnemonic Rule

💡 Remember the Normal Forms
  • 1NF: All attributes depend on the key
  • 2NF: All attributes depend on the whole key
  • 3NF: All attributes depend on nothing but the key

Boyce-Codd Normal Form (BCNF)

📘 What is BCNF?

BCNF is a stronger form of 3NF that handles relations with overlapping candidate keys

When a relation has more than one candidate key, anomalies may result even in 3NF

🔑 BCNF Definition

A relation R is in Boyce-Codd Normal Form (BCNF) if:

Whenever X → A holds in R, then X is a superkey of R

In other words:

Every determinant must be a candidate key

📊 Normal Form Hierarchy

Each normal form is strictly stronger than the previous:

  • Every 2NF relation is in 1NF
  • Every 3NF relation is in 2NF
  • Every BCNF relation is in 3NF

Key Point: Relations can be in 3NF but NOT in BCNF!

3NF vs BCNF

⚠️ When 3NF ≠ BCNF

Recall the general 3NF definition: X → A is okay if either:

  1. X is a superkey, OR
  2. A is a prime attribute

BCNF only allows condition (a)

BCNF removes condition (b), making it stricter than 3NF

BCNF Decomposition

💡 BCNF Normalization Algorithm

To decompose R into BCNF:

  1. Let X → A be the FD that violates BCNF
  2. Decompose R into:
    • R₁ = R - A
    • R₂ = XA
  3. If either R₁ or R₂ is not in BCNF, repeat the process
⚠️ Trade-off with BCNF

Important: BCNF decomposition may not preserve all functional dependencies!

  • Must ensure lossless (non-additive) join property
  • Some FDs may be lost in decomposition
  • This is sometimes acceptable to achieve BCNF

Non-Additive Join Test

📋 Lossless Join Property

A decomposition D = {R₁, R₂} has the lossless join property if either:

  1. (R₁ ∩ R₂) → (R₁ - R₂) is in F⁺, OR
  2. (R₁ ∩ R₂) → (R₂ - R₁) is in F⁺

Translation: The common attributes must functionally determine one of the relation's unique attributes

🎓

Armstrong's Axioms

📘 Inference Rules for FDs

Armstrong's Axioms are proven and complete inference rules for deriving functional dependencies

🔑 The Three Axioms

1. Reflexivity: If Y ⊆ X, then X → Y

Example: AB → B

2. Augmentation: If X → Y, then XZ → YZ

Example: If A → B, then AC → BC

3. Transitivity: If X → Y and Y → Z, then X → Z

Example: If A → B and B → C, then A → C

Derived Rules

📐 Additional Inference Rules

Union: If X → Y and X → Z, then X → YZ

Decomposition: If X → YZ, then X → Y and X → Z

These can be proven using Armstrong's Axioms

💡 Proof Example: Union Rule

Prove: If X → Y and X → Z, then X → YZ

  1. Given: X → Y, X → Z
  2. X → Y implies XX → XY (augmentation)
  3. Augment X → Z with Y: XY → ZY
  4. X → XY → ZY means X → ZY (transitivity)
🔄

Closure Algorithm

📘 What is Closure?

Closure of X (denoted X⁺): The set of all attributes functionally determined by X based on F

Key property: A key contains a full set of attributes in its closure

🔧 Closure Algorithm

Input: Set of FDs F, and set of attributes X

Output: X⁺ (closure of X)

X⁺ := X
repeat
  old_X⁺ := X⁺
  for each FD Y → Z in F do
    if X⁺ ⊇ Y then
      X⁺ := X⁺ ∪ Z
until (X⁺ = old_X⁺)

Finding Keys Using Closure

💡 Complete Example

Relation: SupplierPart(sname, city, status, p#, pname, qty)

Functional Dependencies:

  • fd1: sname → city
  • fd2: city → status
  • fd3: p# → pname
  • fd4: {sname, p#} → qty

Prove {sname, p#} is a key:

Step 1: Show it's a superkey

  1. {sname, p#} → {sname, p#} (trivial)
  2. sname → city (fd1)
  3. sname → status (transitive via fd2)
  4. {sname, p#} → qty (fd4)
  5. p# → pname (fd3)
  6. Therefore: {sname, p#}⁺ = {sname, p#, city, status, qty, pname} ✓

Step 2: Show it's minimal

  • p# is not determined by anything → sname alone is not a key
  • sname is not on right side of any FD with p# alone → p# alone is not a key
  • Therefore {sname, p#} is minimal ✓

Conclusion: {sname, p#} is a key!

📊

Summary of Normal Forms

Normal Form Requirements Eliminates 1NF No multivalued or composite attributes Repeating groups, nested relations 2NF 1NF + No partial dependencies Partial dependencies on candidate keys 3NF 2NF + No transitive dependencies Transitive dependencies through non-keys BCNF Every determinant is a candidate key All dependencies where determinant isn't a superkey
🎯 Practical Guidelines
  • Normalize to at least 3NF in practice
  • BCNF is ideal but may sacrifice dependency preservation
  • 4NF and 5NF rarely used in practice
  • Denormalization: Sometimes acceptable for performance (but document it!)
✨ Key Takeaways
  • Normalization eliminates redundancy and anomalies
  • Each normal form builds on the previous one
  • Higher normal forms mean better data integrity
  • Trade-offs exist between normalization and performance
  • Understanding FDs is essential for good database design