CH · 12

File-System Implementation

CS330 · Operating Systems · Prince Sultan University

Chapter 12:
File-System Implementation

This chapter is really about four exam ideas: what is stored on disk, what is kept in memory, how logical file blocks map to physical disk blocks, and how the OS keeps track of free space.

🗂 On-Disk Structures 🧠 In-Memory Structures 📦 Contiguous Allocation 🔗 Linked Allocation 📑 Indexed Allocation 🟩 Bit Vector ⛓ Free-Space Linked List 🗃 Grouping & Counting
🗺
Chapter Overview
🎯 THE BIG QUESTION Ch 11 asked "what does the file system let the user do?" Ch 12 asks "what structures make that possible?" Focus on four buckets: on-disk structures, in-memory structures, allocation methods, and free-space management.

What You'll Master

  • On-disk structures: boot block, superblock, inode, FCB
  • In-memory structures: mount table, open-file tables, buffers
  • Three allocation methods: Contiguous, Linked, Indexed
  • Four free-space methods: Bit vector, Linked list, Grouping, Counting
  • Calculating logical → physical addresses for each method

Exam Focus

  • Know the names and jobs of the on-disk and in-memory structures
  • Be able to translate a logical address for contiguous, linked, and indexed allocation
  • Compare methods clearly: fragmentation, random access, growth, and overhead
  • Recognize free-space methods quickly: bitmap, linked list, grouping, counting
💾
On-Disk File-System Structures

The Disk Layout - Layers from Outer to Inner

Think of a disk as a book. The on-disk structures are the table of contents, chapters, and index - all physically stored on the disk, surviving power-off.

Disk Layout (Conceptual)

Boot Block
Block 0
Superblock / Volume Control Block
Partition metadata
Directory Structure
Root + subdirs + inode #s
Per-File FCB / inode
Metadata + block pointers
Data Blocks
Actual file content

← Low address on disk                                                          High address →

① Boot Control Block

  • First block of the volume (if it's bootable)
  • Contains info needed to boot the OS from that partition
  • Called Boot Sector (NTFS) or Boot Block (UFS)
  • Empty on non-boot volumes
💡 LINK TO CH 1 This is what the bootstrap program in ROM/EPROM reads first! It loads the OS kernel into memory to start the system.

② Volume Control Block (Superblock)

  • Stores partition-level metadata:
  • Total number of blocks
  • Number of free blocks
  • Block size
  • Free block pointers / free inode or FCB information
  • Called Superblock in UNIX-like systems

③ Directory Structure

  • Organises all files on the partition
  • In UNIX: maps file names → inode numbers
  • Starts at the root directory
  • Stored on disk - survives reboot

④ Per-File FCB (inode)

  • File Control Block - one per file
  • Contains: unique ID, type, size, protection, timestamps, owner, and pointers to data blocks
  • Called inode in most UNIX systems
  • In UNIX, the file name is stored in the directory entry, not inside the inode itself
  • Created at file creation; deleted when file is deleted
🧠 Memory Aid - On-Disk Structures
B · S · D · F · Data

Boot block → Superblock → Directory → FCB/inode → Data blocks
Think: "Big Systems Directly Find Data"

⚠ EXAM TRAP - Superblock vs FCB The superblock stores partition-wide info (total blocks, free blocks, block size). The FCB/inode stores per-file metadata and block pointers. The directory entry stores the file name and points to the inode. Don't mix these three roles.
🧠
In-Memory File-System Structures

Why In-Memory Structures?

Disk access is slow. The OS therefore keeps frequently used file-system metadata in RAM so each file operation does not have to re-read the disk. Think of memory structures as the working copies of the permanent on-disk structures.

Mount Table

  • Lists all currently mounted file system volumes
  • Each entry: device, mount point, FS type, options
  • When you run mount /dev/sdb1 /mnt/usb, an entry is added here

Directory Structure Cache

  • Caches recently accessed directory entries in memory
  • Avoids re-reading directories from disk on every path lookup
  • Called the dentry cache (dcache) in Linux

System-Wide Open-File Table

  • One entry per open file in the entire system (not per process)
  • Contains: a memory copy/reference of the FCB or inode plus status information such as open count
  • When open count hits 0 → entry is removed

Per-Process Open-File Table

  • One table per running process
  • Contains: pointer into the system-wide table plus process-specific access information
  • This is the table indexed by the file descriptor returned to the process

📖 How open() Works - Step by Step

1
Search the directory structure to map the file name to its inode/FCB entry
2
Load the inode/FCB metadata into memory (if not already cached) and connect it to the system-wide open-file table
3
Create an entry in the process's open-file table that points to the system-wide entry
4
Return a file descriptor (fd) - the index the process will use in later read/write/close calls

📕 How close() Works

1
Remove entry from the per-process open-file table
2
Decrement open count in system-wide table
3
If open count = 0, flush FCB back to disk and remove from system-wide table
💡 WHY TWO TABLES? The system-wide table avoids repeating the same metadata work for every process. The per-process table is the process's handle view of that file. In exam answers, the key distinction is shared system entry versus process-specific reference.
📦
Allocation Methods

The Core Problem

A file is a sequence of logical blocks. The disk is a sequence of physical blocks. How do we map one to the other? Three strategies - each with distinct trade-offs on fragmentation, speed, and random access.

📌 KEY FORMULAS YOU MUST KNOW Every allocation method gives you the same two outputs: which physical block to read and which byte offset inside that block. The method changes only how you find that physical block.

① Contiguous Allocation

Each file occupies a set of consecutive (contiguous) blocks on disk. The directory entry stores only the starting block address and the length (number of blocks).

Example: File A starts at block 6, size = 4 blocks
0
1
2
3
4
5
6
7
8
9
10
← File A (blocks 6 - 9)
Logical Block Number (LBN) = LA ÷ BlockSize
Offset = LA mod BlockSize
Physical Block = StartAddress + LBN  |  Displacement = Offset
✅ EXAMPLE (from OLD_MAJOR_SIWAR / fe_Solution_211) File A: start = 6, block size = 1024 bytes. LA = 2500.
Q = 2500 ÷ 1024 = 2  |  R = 2500 mod 1024 = 452
Physical Block = 6 + 2 = 8  |  Displacement = 452
  • ✅ Simple - only start + length needed
  • ✅ Excellent for sequential AND random access
  • ❌ External fragmentation - holes appear as files are deleted
  • ❌ Files cannot grow - no adjacent space may exist
  • ❌ Must know file size in advance
🧠 Think

Contiguous = parking lot with one long reserved spot. Easy to find your car (start + length). Problem: can't grow, leaves gaps when cars leave.

② Linked Allocation

Each file is a linked list of disk blocks. Each block contains a pointer to the next block. Blocks can be anywhere on the disk. The directory stores the first block pointer only.

Example: File A → Block 8 → Block 3 → Block 10 → NULL
0
1
2
3
→10
4
5
6
7
8
→3
9
10
→nil
If each block stores both data and a pointer:
Logical Block Number = LA ÷ (BlockSize − PointerSize)
Offset = LA mod (BlockSize − PointerSize)
Then follow the chain from the first block for that many links
⚠ KEY INSIGHT - FAT (File Allocation Table) FAT is a variation of linked allocation where pointers are stored in a separate table at the start of the disk rather than inside each block. This allows the data section of each block to be a full power-of-2 size, and the FAT itself can be cached in memory for speed. Used by MS-DOS and early Windows.
  • ✅ No external fragmentation
  • ✅ Files can grow easily - just add any free block
  • ✅ No need to declare file size upfront
  • Sequential access only - to reach block i, must traverse from block 0
  • ❌ Each block wastes space on a pointer
  • ❌ Poor reliability - one broken pointer loses the rest of the file
🧠 Think

Linked = a treasure hunt - each clue leads to the next. Can't skip to clue 7 without following clues 1 - 6 first. Great for exploring in order, terrible for jumping around.

③ Indexed Allocation

Each file has its own dedicated index block - an array of pointers to all its data blocks. The directory stores the address of the index block. Supports both sequential and direct (random) access.

Index block contains: [6, 11, 3, 9, ...] → each is a data block address
idx
block
6
11
3
9
← data blocks (can be anywhere)
Logical Block Number = LA ÷ BlockSize
Offset = LA mod BlockSize
Physical Block = IndexBlock[Logical Block Number]  |  Displacement = Offset
✅ EXAMPLE File B: index block at disk address 25. LA = 5000, block size = 2048.
Q = 5000 ÷ 2048 = 2 (truncated)  |  R = 5000 mod 2048 = 904
Physical Block = IndexBlock[2] (look up the 3rd pointer)  |  Displacement = 904
  • ✅ No external fragmentation
  • ✅ Supports random (direct) access efficiently
  • ✅ Files can grow (add entries to index block)
  • ❌ Index block takes up space even for tiny files
  • ❌ Large files may need multi-level index blocks
🧠 Think

Indexed = a book's table of contents. The TOC (index block) lists exactly where each chapter (data block) starts. You can jump directly to any chapter without reading the others.

UNIX UFS Combined Scheme (Inode)

UNIX handles large files by using a combined indexing scheme inside each inode:

Pointer Type How It Works Covers
Direct (12 pointers) Points directly to a data block Small files (up to 12 × BlockSize)
Single Indirect Points to a block of pointers → each points to data Medium files
Double Indirect Points to block of pointers → each points to block of pointers → data Large files
Triple Indirect 3 levels of indirection → data Very large files
🧠 Mnemonic - Inode Levels

"Direct, Single, Double, Triple" - like floors in a building: ground floor = direct access, floor 1 = one elevator ride, floor 2 = two rides, floor 3 = three rides. Higher floor = more overhead but reaches higher "addresses".

📊 Allocation Methods - Master Comparison
Feature Contiguous Linked Indexed
External Fragmentation Yes ❌ No ✅ No ✅
Main Space Cost Possible unused bytes in last block Pointer field inside each allocated block Separate index block(s) + possible last-block waste
Sequential Access Excellent ✅ Good ✅ Good ✅
Random (Direct) Access Excellent ✅ Poor ❌ (must traverse) Excellent ✅
File Can Grow? Hard ❌ Easy ✅ Easy ✅
Disk I/O for access 1 access N accesses (chain walk) 2 accesses (index + data)
Pointer Overhead None Per block Per file (index block)
Classic Example CD-ROM, old systems Linked-list style files or FAT-style variants UNIX UFS, ext3/4
⚠ EXAM TRAP - Linked vs Indexed for random access Linked allocation cannot efficiently support random access (direct access). To reach block N, you must follow N pointers from the start. This is why file systems evolved to indexed allocation. Exam questions love asking: "Which allocation method supports direct access?" → Answer: Contiguous and Indexed (NOT Linked).
🟩
Free-Space Management

The Problem

When a file is deleted, its blocks become free. The OS must track which blocks are available so it can allocate them to new files. Four strategies - each with different speed/space trade-offs.

① Bit Vector (Bitmap)

One bit per disk block. 1 = free, 0 = allocated.

Bit vector example - 0=allocated, 1=free:
0
0
1
1
1
1
0
0
1
1
1
1
1
1
0
0
Blocks 0 - 1: allocated  |  Blocks 2 - 5: free  |  Blocks 6 - 7: allocated  |  Blocks 8 - 13: free
Block number = (bits per word) × (number of 0-value words) + offset of first 1 bit

A "0-value word" = all bits = 0 (all blocks in that word are allocated)
✅ WORKED EXAMPLE (from Chapter 12 slides) Each word = 8 bits. Bit vector:
Word 0: 00000000 (0-value)  |  Word 1: 00000000 (0-value)  |  Word 2: 00010000 (has a free block at position 3)

Block number = 8 × 2 + 3 = 19 → first free block is block 19.
✅ SIZE CALCULATION EXAMPLE (from Chapter 12) Disk = 1 TB, Block size = 4 KB.
Number of blocks = 1 TB ÷ 4 KB = 268,435,456 blocks.
Bitmap size = 268,435,456 bits ÷ 8 = 33,554,432 bytes = 32 MB
  • ✅ Simple, fast to find first free block using CPU bit operations
  • ✅ Easy to find contiguous free blocks
  • ❌ Inefficient unless entire bitmap kept in memory
  • ❌ Requires extra space proportional to disk size

② Linked List (Free List)

All free blocks are linked together. A special pointer (in the superblock) points to the first free block. Each free block contains a pointer to the next free block.

Example: First free = block 4, chain: 4 → 7 → 5 → 9 → 10 → nil
0
used
1
2
3
4
→7
5
→9
6
7
→5
8
9
→10
10
→nil
  • ✅ No wasted space - pointers stored inside free blocks themselves
  • ❌ Cannot efficiently find contiguous free space
  • ❌ Traversing entire list requires many disk reads (I/O intensive)
  • ❌ Poor performance for bulk allocation
🧠 Think

Linked free list = chain of empty rooms in a hotel where each room's key tells you which room is free next. Finding 5 rooms in a row requires walking the whole chain.

③ Grouping

A modification of the linked list. The first free block stores the addresses of n free blocks. The last of these n blocks stores the addresses of the next n free blocks, and so on.

✅ KEY ADVANTAGE over simple linked list Instead of reading one block to find one free block address, you read one block to get n−1 free block addresses at once (the last entry in that block points to the next group). Much faster bulk allocation.
  • ✅ Large number of free block addresses found quickly
  • ✅ Better than simple linked list for bulk operations
  • ❌ More complex than bitmap or linked list

④ Counting

Instead of listing every free block, store the address of the first free block and a count of how many consecutive free blocks follow it.

Free-space list entries: (start, count)
0
1
2
3
start
4
free
5
6
7
start
8
free
9
free
10
11
start
Free list = [ (3, 2), (7, 3), (11, 1) ] → 3 entries instead of 6 individual blocks
  • ✅ Compact list - especially useful when free space is largely contiguous
  • ✅ Each entry covers multiple blocks with (address, count)
  • ❌ Each entry larger than a simple address
  • ❌ Less efficient when free blocks are scattered
🧠 Think

Counting = "I have 3 parking spots starting at row 7." Much more compact than listing spots 7, 8, 9 separately - especially in a mostly-empty lot.

Method Storage Structure Find Free Block Find Contiguous? Best For
Bit Vector 1 bit per block array Scan bitmap - fast with CPU ops Yes ✅ General purpose - most common
Linked List Chain of free blocks Get head pointer - O(1) No ❌ Simple systems, low memory
Grouping First free block holds n addresses Read one block → n addresses Partial When many free blocks needed fast
Counting (address, count) pairs Scan short list Yes ✅ When free space is mostly contiguous
Study Tips & Tricks
🧠 The 3 Allocation Methods
C · L · I

Contiguous - fast, fragmented
Linked - flexible, no random access
Indexed - best of both, overhead

🧠 The 4 Free-Space Methods
B · L · G · C

Bitmap - 1 bit per block
Linked - chain of free blocks
Grouping - chunks of n addresses
Counting - (start, count) pairs

🎯 How to Solve Address Translation Problems

  1. Identify the allocation method - contiguous, linked, or indexed?
  2. Write the correct formula for that method (Q and R)
  3. For Contiguous: Q = LA ÷ BlockSize, Physical = Start + Q, Offset = R
  4. For Linked: Q = LA ÷ (BlockSize − PtrSize), walk Q steps in chain
  5. For Indexed: Q = LA ÷ BlockSize, lookup IndexBlock[Q], Offset = R
  6. Check validity: Is the resulting block within the file's bounds?

🎯 How to Solve Bitmap Problems

  1. Identify word size (e.g. 8 bits, 16 bits, 32 bits)
  2. Count 0-value words (all bits = 0) from the start
  3. Find the first word that is NOT all zeros
  4. Within that word, find the offset (position) of the first 1 bit - count from the left (bit 0 = leftmost)
  5. Block number = (bits per word × number of 0-value words) + offset
💡 CONTIGUOUS vs LINKED for exam MCQ If a question says "supports direct/random access" → Contiguous or Indexed.
If a question says "no external fragmentation" → Linked or Indexed.
Contiguous = external fragmentation. Indexed = index block overhead.
💡 INODE in UNIX When asked about UNIX file system - the inode IS the FCB. It uses a combined scheme: 12 direct + single indirect + double indirect + triple indirect. This is indexed allocation at its finest.
⚠ COMMON MISTAKE - Bit Vector 0 vs 1 The book uses: bit = 1 → free, bit = 0 → allocated. Some students flip this! Always state your convention when answering. In the worked example: "Word 2: 00010000 → bit at position 3 is 1 → free → block 19."
📌 LINKED TO PAST EXAMS The fe_Solution_211 final exam asked about: (a) contiguous allocation I/O counts for inserting a block, (b) disadvantages of contiguous allocation, and (c) how the OS views a disk. These exact topics come from Ch 12. Also, Quiz 4 (Dr Helene) tested file operations, allocation, and protection - all tying into Ch 11 + 12.
📝
Past Exam Questions & Answers
💡 HOW TO USE Cover the answer, think it through, then click to reveal. These come directly from PSU CS330 past exams and final exam samples.
Q1. A file consists of 30 disk blocks (numbered 0 - 29). A block is inserted in the middle (as the 15th block). How many disk read and write I/O operations are needed if linked allocation is used? Final Exam Sample
✅ ANSWER: 15 reads + 2 writes In linked allocation, to reach the 15th position you must follow the chain from block 0 to block 14. That's 15 reads (blocks 0 through 14, reading each pointer to find the next). Then: 1 write to update block 14's pointer to point to the new block, and 1 write to write the new block with its pointer to the old 15th block = 2 writes total.

This is a classic example of why linked allocation has poor performance for insertions - sequential traversal is mandatory.
Q2. What are the disadvantages of contiguous disk allocation? Give three. Final Exam Sample
✅ ANSWER:
  • External fragmentation: Free space breaks into scattered small holes. Requires compaction/defragmentation which is expensive.
  • File size must be known in advance: Hard to estimate for output files. If you underestimate, the file can't grow.
  • Files cannot grow: If there's no adjacent free space, the entire file must be relocated to a larger hole.
Q3. File A has start address = 6, block size = 1024 bytes. What disk block contains the byte at logical address (file offset) 2500? What is the displacement within that block? Chapter 12 / OLD_MAJOR
✅ ANSWER (Contiguous Allocation):
Q = 2500 ÷ 1024 = 2 (integer division)
R = 2500 mod 1024 = 2500 − (2 × 1024) = 452
Physical disk block = Start + Q = 6 + 2 = Block 8
Displacement within block = 452 bytes
Q4. What are the on-disk data structures used to implement a file system? Name and briefly describe each. Chapter 12 Core
✅ ANSWER:
  • Boot Control Block (Boot Sector): Contains info needed to boot the OS from that partition. Located at the start of the volume.
  • Volume Control Block (Superblock/MFT): Stores partition metadata - total blocks, free blocks, block size, free block pointers.
  • Directory Structure: Organises file names and their associated inode/FCB numbers, starting at the root.
  • Per-file FCB (inode): Stores file metadata such as type, size, timestamps, owner, protection, and pointers to data blocks.
Q5. What is the difference between the system-wide open-file table and the per-process open-file table? Chapter 12 Core
✅ ANSWER: The system-wide open-file table has one entry per open file in the entire system. It contains the FCB/inode copy, the open count (how many processes have the file open), and other metadata. When the open count hits 0, the entry is removed.

The per-process open-file table has one entry per file per process. It contains a pointer into the system-wide table, the current file offset for that process, and the access mode (read/write). This is why two processes reading the same file have independent file positions.
Q6. Consider a free-space bitmap. Each word = 8 bits. The bitmap is:
Word 0: 00000000 | Word 1: 00000000 | Word 2: 00010000
What is the first free block number?
Chapter 12 Slides
✅ ANSWER: Block 19
Step 1: Words 0 and 1 are 0-value (all bits = 0) → 2 zero-value words.
Step 2: Word 2 = 00010000 → first 1 bit is at position 3 (counting from left, index 3).
Step 3: Block = (bits per word × zero-value words) + offset = (8 × 2) + 3 = 19
Q7. A disk has block size = 4 KB and size = 1 TB. How many bytes does the free-space bitmap require? Chapter 12 Slides
✅ ANSWER: 32 MB
Number of blocks = 1 TB ÷ 4 KB = 1,099,511,627,776 ÷ 4,096 = 268,435,456 blocks
Bitmap size = 268,435,456 bits ÷ 8 = 33,554,432 bytes = 32 MB

This shows a practical limitation of the bitmap approach - for large disks, the bitmap itself requires significant memory to keep it loaded in RAM.
Q8. What is the FAT (File Allocation Table) and how does it relate to linked allocation? Chapter 12 Core
✅ ANSWER: FAT is a variation of linked allocation. In standard linked allocation, each disk block stores a pointer to the next block - this wastes space and means each block's data area is not a power of 2.

FAT solves this by moving all the pointers out of the data blocks and into a separate table at the start of the disk. The FAT has one entry per disk block; each entry is the block number of the next block in the file chain (or a special EOF marker). The data blocks themselves contain only file data.

Advantages over standard linked: The entire FAT can be cached in memory for speed; data blocks are full-sized powers of 2; supports faster random access since you can walk the FAT in memory rather than reading disk blocks to find pointers.
Q9. Which allocation method does the UNIX file system (UFS) use? Explain its inode structure briefly. Chapter 12 Core
✅ ANSWER: Indexed allocation (combined scheme) UNIX UFS uses an inode-based indexed allocation. Each inode contains:
  • 12 direct pointers: Each points directly to a data block - efficient for small files
  • 1 single indirect pointer: Points to a block of pointers (one level of indirection)
  • 1 double indirect pointer: Points to a block of pointers, each pointing to another block of pointers
  • 1 triple indirect pointer: Three levels of indirection - for very large files
This allows small files to be accessed quickly (direct) while supporting very large files through indirect blocks.
Q10. What is the grouping method of free-space management? How does it improve upon the linked list method? Chapter 12 Core
✅ ANSWER: In the simple linked list method, reading one free block gives you one free block address (plus a pointer to the next). To get N free block addresses, you must read N blocks - N disk accesses.

In grouping, the first free block stores the addresses of n free blocks. The last of those n entries points to another group of n free blocks. Reading one disk block gives you n−1 usable free block addresses immediately - far fewer disk accesses for bulk allocation.
Q11. For indexed allocation: File B has its index block at disk block 25. Block size = 512 bytes. What disk block contains logical address 1200? What is the displacement? Chapter 12 / OLD_MAJOR
✅ ANSWER: Q = 1200 ÷ 512 = 2 (integer division; 2 × 512 = 1024 ≤ 1200 < 1536=3 × 512)
R = 1200 mod 512 = 1200 − 1024 = 176
Physical Block = IndexBlock[2] (read the 3rd entry, index 2, from the index block at disk block 25)
Displacement = 176 bytes into that block.

Note: You must first read disk block 25 to get the index table, then use the value at position [2] to get the actual data block address. This is why indexed allocation requires 2 disk accesses (1 for index + 1 for data).
Q12. Compare contiguous, linked, and indexed allocation for: (a) external fragmentation, (b) internal fragmentation, (c) ability to support random/direct access. Chapter 12 / OLD_MAJOR_SIWAR
✅ ANSWER:
  • (a) External fragmentation: Contiguous = YES (big problem). Linked = No. Indexed = No.
  • (b) Main space overhead: Contiguous = possible unused bytes in the last block. Linked = pointer field stored in each allocated block. Indexed = index block overhead (and possible unused bytes in the last data block).
  • (c) Random/direct access: Contiguous = Excellent (O(1), start + Q). Linked = Poor (must traverse chain from start, O(n)). Indexed = Excellent (O(1), look up IndexBlock[Q]).
Q13. What are the in-memory structures used to implement a file system? Name four. Chapter 12 Core
✅ ANSWER:
  • Mount table: Info about each mounted volume/device currently attached to the system.
  • Directory structure cache (dentry cache): Caches recently accessed directory entries to avoid repeated disk reads for path lookups.
  • System-wide open-file table: One entry per open file in the system; contains FCB copy and open count.
  • Per-process open-file table: One per process; contains pointer to system-wide entry, current file offset, and access mode.
  • (Also: Buffers that hold file-system blocks being read/written to disk.)

📚 CHAPTER SUMMARY Chapter 12 reveals the machinery behind files. On disk, the OS maintains boot blocks, superblocks, directories, and inodes. In memory, it caches these structures for speed. Three allocation strategies (Contiguous, Linked, Indexed) each solve the mapping problem differently - contiguous is fast but rigid, linked is flexible but sequential-only, indexed gives the best of both. Free space is tracked via bitmaps, linked lists, grouping, or counting. The bitmap block-number formula and the address translation formulas for all three methods are essential exam skills.