📐 Functional Dependencies & Normalization

Information Preservation: Maintain all concepts including attributes, entities, relationships, and specializations from conceptual design
Minimum Redundancy: Minimize redundant storage and reduce need for multiple updates

Question: Is redundancy always bad? Sometimes redundancy is acceptable for performance reasons!

Four Key Guidelines

📌 Guideline 1: Clear Semantics

Semantics of relation attributes must be clear

Each tuple should represent one entity or relationship instance
Don't mix attributes from different entities (EMPLOYEE, DEPARTMENT, PROJECT)
Use foreign keys to refer to other entities
Keep entity and relationship attributes separate
Bottom line: Design schemas that are easy to explain

📌 Guideline 2: Avoid Anomalies

Design schemas that don't suffer from insertion, deletion, and update anomalies

If anomalies exist, document them so applications can handle them properly

📌 Guideline 3: Minimize NULL Values

Design relations with as few NULL values as possible

Frequently NULL attributes should be in separate relations
NULLs waste storage space
NULLs have multiple interpretations (not applicable, unknown, unavailable)
Can cause problems with JOINs and understanding attribute meaning

📌 Guideline 4: Avoid Spurious Tuples

Design relations that can be joined without generating spurious (false) tuples

Join on attributes that are primary keys and foreign keys
Bad designs can generate erroneous results in JOINs
Always use lossless joins

💾

Redundancy & Update Anomalies

⚠️ Problems with Redundant Information

Wastes storage space
Causes update anomalies:
- Insertion anomalies
- Deletion anomalies
- Modification anomalies

The Three Types of Anomalies

🔴 Update (Modification) Anomaly

Example: Consider EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Problem: Changing project P1's name from "Billing" to "Customer-Accounting" requires updating ALL 100 employees working on that project!

Impact: Data inconsistency if some updates are missed

🟠 Insertion Anomaly

Example: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Problems:

Cannot insert a project unless an employee is assigned to it
Cannot insert an employee unless they're assigned to a project

Impact: Loss of information about projects/employees

🟡 Deletion Anomaly

Example: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Problems:

Deleting a project deletes ALL employees working on it
Deleting the sole employee on a project deletes the project

Impact: Unintended loss of important information

🔗

Functional Dependencies (FDs)

📘 What are Functional Dependencies?

A formal tool for analyzing relational schemas that enables detection and description of design problems in precise form.

Definition: A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y

🔑 Formal Definition

X → Y holds if: Whenever two tuples have the same value for X, they must have the same value for Y

For any two tuples t₁ and t₂: If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]

X → Y

Key Points:

FDs are a property of the relation schema, not individual instances
FDs are derived from real-world constraints
Used to define normal forms

Examples of Functional Dependencies

💡 Common FD Examples

SSN → ENAME

Social security number determines employee name

PNUMBER → {PNAME, PLOCATION}

Project number determines project name and location

{SSN, PNUMBER} → HOURS

Employee SSN and project number together determine hours worked

Types of Functional Dependencies

Type Description Example Full Dependency Removal of any attribute breaks the FD {SSN, PNUMBER} → HOURS Partial Dependency A subset of the left side can determine the right side SSN → ENAME (when key is {SSN, PNUMBER}) Transitive Dependency X → Y and Y → Z, therefore X → Z SSN → DNUMBER → DMGR_SSN

Determining FDs from Instances

⚠️ Important Rules

From a relation instance, we can only conclude that an FD may exist
We can definitively conclude that certain FDs do NOT exist when tuples violate them
Understanding attribute meaning and relationships is essential
If X → Y in R, this does NOT mean Y → X in R

🔑 Key Property

If X is a candidate key of R, then X → R

This means a candidate key functionally determines all attributes in the relation

🔑

Keys & Attribute Classifications

📘 Superkey

A superkey is a set of attributes S ⊆ R where no two tuples will have the same values for S

Example: {ID}, {ID, First_name, Last_name}, {ID, Order_date}

🔑 Key (Candidate Key)

A key is a minimal superkey - removing any attribute causes it to no longer be a superkey

Example: {ID} or {First_name, Last_name} (if unique)

🎯 Key Terminology

Candidate Key: Any valid key for the relation
Primary Key: One candidate key chosen as the main identifier
Secondary Keys: Other candidate keys not chosen as primary
Prime Attribute: Member of at least one candidate key
Non-prime Attribute: NOT a member of any candidate key

1️⃣

First Normal Form (1NF)

📘 Definition of 1NF

A relation is in First Normal Form if it does NOT allow:

Composite attributes (attributes with sub-parts)
Multivalued attributes (attributes with multiple values)
Nested relations (relations within relations)
Repeated groups

✅ 1NF Requirements

Each attribute must contain only atomic (indivisible) values

Considered part of the definition of a relation
Most RDBMSs only allow 1NF relations
All values must be from the same domain

Normalizing to 1NF

💡 Method 1: Separate Relation

For multivalued attributes:

Remove attributes that violate 1NF
Place them in a separate relation with the primary key
Primary key of new relation = combination of original PK + multivalued attribute

💡 Method 2: Expand Key (with redundancy)

Alternative approach:

Expand the key to include the multivalued attribute
Create separate tuple for each value
Warning: Introduces redundancy!

💡 Nested Relations

To normalize nested relations:

Remove the nested relation attributes
Create new relation with original PK + nested relation's candidate key

2️⃣

Second Normal Form (2NF)

📘 Definition of 2NF

A relation schema R is in Second Normal Form (2NF) if:

It is in 1NF, AND
Every non-prime attribute is fully functionally dependent on the primary key

🔑 Key Concept: Full Functional Dependency

Full FD: Removal of any attribute from the left side breaks the dependency

✅ {SSN, PNUMBER} → HOURS

Full FD - neither SSN → HOURS nor PNUMBER → HOURS holds

❌ {SSN, PNUMBER} → ENAME

Partial dependency - SSN → ENAME holds

⚠️ 2NF Violation

A relation violates 2NF when:

A non-prime attribute depends on only part of a composite primary key
This is called a partial dependency

Normalizing to 2NF

💡 2NF Normalization Process

Steps:

Identify all partial dependencies
For each partial dependency X → Y:
- Create new relation with X and Y
- Remove Y from original relation
- Keep X in original relation as foreign key

📋 General Definition (Multiple Keys)

A relation is in 2NF if every non-prime attribute is NOT partially dependent on any key of R

This accounts for relations with multiple candidate keys

3️⃣

Third Normal Form (3NF)

📘 Definition of 3NF (Simple)

A relation schema R is in Third Normal Form (3NF) if:

It is in 2NF, AND
No non-prime attribute is transitively dependent on the primary key

🔑 Transitive Dependency

Transitive FD: X → Z can be derived from X → Y and Y → Z

SSN → DNUMBER → DMGR_SSN

Therefore: SSN → DMGR_SSN is transitive

✅ SSN → ENAME

Non-transitive - no intermediate attribute

⚠️ Important Note

In X → Y → Z with X as primary key:

Only a problem if Y is NOT a candidate key
If Y IS a candidate key, no problem exists

General 3NF Definition (Multiple Keys)

🎯 Formal 3NF Definition

A relation R is in 3NF if whenever X → A holds, then either:

X is a superkey of R, OR
A is a prime attribute of R

📋 Alternative 3NF Definition

A relation is in 3NF if every non-prime attribute:

Is fully functionally dependent on every key, AND
Is non-transitively dependent on every key

Note: This definition shows that 3NF implies 2NF

The Mnemonic Rule

💡 Remember the Normal Forms

1NF: All attributes depend on the key
2NF: All attributes depend on the whole key
3NF: All attributes depend on nothing but the key

⭐

Boyce-Codd Normal Form (BCNF)

📘 What is BCNF?

BCNF is a stronger form of 3NF that handles relations with overlapping candidate keys

When a relation has more than one candidate key, anomalies may result even in 3NF

🔑 BCNF Definition

A relation R is in Boyce-Codd Normal Form (BCNF) if:

Whenever X → A holds in R, then X is a superkey of R

In other words:

Every determinant must be a candidate key

📊 Normal Form Hierarchy

Each normal form is strictly stronger than the previous:

Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF

Key Point: Relations can be in 3NF but NOT in BCNF!

3NF vs BCNF

⚠️ When 3NF ≠ BCNF

Recall the general 3NF definition: X → A is okay if either:

X is a superkey, OR
A is a prime attribute

BCNF only allows condition (a)

BCNF removes condition (b), making it stricter than 3NF

BCNF Decomposition

💡 BCNF Normalization Algorithm

To decompose R into BCNF:

Let X → A be the FD that violates BCNF
Decompose R into:
- R₁ = R - A
- R₂ = XA
If either R₁ or R₂ is not in BCNF, repeat the process

⚠️ Trade-off with BCNF

Important: BCNF decomposition may not preserve all functional dependencies!

Must ensure lossless (non-additive) join property
Some FDs may be lost in decomposition
This is sometimes acceptable to achieve BCNF

Non-Additive Join Test

📋 Lossless Join Property

A decomposition D = {R₁, R₂} has the lossless join property if either:

(R₁ ∩ R₂) → (R₁ - R₂) is in F⁺, OR
(R₁ ∩ R₂) → (R₂ - R₁) is in F⁺

Translation: The common attributes must functionally determine one of the relation's unique attributes

🎓

Armstrong's Axioms

📘 Inference Rules for FDs

Armstrong's Axioms are proven and complete inference rules for deriving functional dependencies

🔑 The Three Axioms

1. Reflexivity: If Y ⊆ X, then X → Y

Example: AB → B

2. Augmentation: If X → Y, then XZ → YZ

Example: If A → B, then AC → BC

3. Transitivity: If X → Y and Y → Z, then X → Z

Example: If A → B and B → C, then A → C

Derived Rules

📐 Additional Inference Rules

Union: If X → Y and X → Z, then X → YZ

Decomposition: If X → YZ, then X → Y and X → Z

These can be proven using Armstrong's Axioms

💡 Proof Example: Union Rule

Prove: If X → Y and X → Z, then X → YZ

Given: X → Y, X → Z
X → Y implies XX → XY (augmentation)
Augment X → Z with Y: XY → ZY
X → XY → ZY means X → ZY (transitivity)

🔄

Closure Algorithm

📘 What is Closure?

Closure of X (denoted X⁺): The set of all attributes functionally determined by X based on F

Key property: A key contains a full set of attributes in its closure

🔧 Closure Algorithm

Input: Set of FDs F, and set of attributes X

Output: X⁺ (closure of X)

X⁺ := X

repeat

  old_X⁺ := X⁺

  for each FD Y → Z in F do

    if X⁺ ⊇ Y then

      X⁺ := X⁺ ∪ Z

until (X⁺ = old_X⁺)

Finding Keys Using Closure

💡 Complete Example

Relation: SupplierPart(sname, city, status, p#, pname, qty)

Functional Dependencies:

fd1: sname → city
fd2: city → status
fd3: p# → pname
fd4: {sname, p#} → qty

Prove {sname, p#} is a key:

Step 1: Show it's a superkey

{sname, p#} → {sname, p#} (trivial)
sname → city (fd1)
sname → status (transitive via fd2)
{sname, p#} → qty (fd4)
p# → pname (fd3)
Therefore: {sname, p#}⁺ = {sname, p#, city, status, qty, pname} ✓

Step 2: Show it's minimal

p# is not determined by anything → sname alone is not a key
sname is not on right side of any FD with p# alone → p# alone is not a key
Therefore {sname, p#} is minimal ✓

Conclusion: {sname, p#} is a key!

📊

Summary of Normal Forms

Normal Form Requirements Eliminates 1NF No multivalued or composite attributes Repeating groups, nested relations 2NF 1NF + No partial dependencies Partial dependencies on candidate keys 3NF 2NF + No transitive dependencies Transitive dependencies through non-keys BCNF Every determinant is a candidate key All dependencies where determinant isn't a superkey

🎯 Practical Guidelines

Normalize to at least 3NF in practice
BCNF is ideal but may sacrifice dependency preservation
4NF and 5NF rarely used in practice
Denormalization: Sometimes acceptable for performance (but document it!)

✨ Key Takeaways

Normalization eliminates redundancy and anomalies
Each normal form builds on the previous one
Higher normal forms mean better data integrity
Trade-offs exist between normalization and performance
Understanding FDs is essential for good database design