What is High-Level Design (HLD)?

Hands-on practice for this lecture. Work through the exercises and quizzes to reinforce what you've learned.

Exercise 1 of 2

Sort 50 Petabytes â€” Why a Single Machine Fails

sorted() loads everything into RAM. 50 PB is 3,125,000Ã— larger than a 16 GB machine. Find the number of servers that splits the data into chunks that actually fit.

interact to see yellow flashes â†’

Sort 50 PB on a single 16 GB machine

File size vs available RAM

File to sort50 PB

Available RAM16 GB

← invisible at this scale

3,125,000×

more data than available RAM

50,000,000 GB cannot fit in 16 GB â€” not even close

sort.py

with open("data.txt") as f:
lines = f.readlines()  # loads ALL 50 PB into RAM
print(sorted(lines))  # never reached

âŒ readlines() requests all 50,000,000 GB at once. The OS refuses immediately. The program exits before it reads a single line.

Exercise 2 of 2

MapReduce: How 50 PB Gets Sorted Across Thousands of Servers

Step through the four phases â€” raw data, local sort, shuffle by key range, and k-way merge â€” with 3 servers and 9 words standing in for 50,000 servers and 50 PB.

interact to see yellow flashes â†’

Merge data from 3 servers â€” naive approach

Three servers each hold a slice of the dataset. The naive approach: concatenate all three lists and call sort() on the combined result.

Server 1

mangoapplezebra

Server 2

grapebananalemon

Server 3

peachcherrymelon

Result: concatenate all three lists

mangoapplezebragrapebananalemonpeachcherrymelon

âŒ The output is unsorted â€” it's just the three lists stuck together. You'd need a full sort() pass over all 9 items on a single machine. At petabyte scale, that single machine doesn't exist.

Re-read the lecture All lectures →