Apache isn't that great at serving static content for a large number of requests vs. NetScaler load balancing
Python is fast enough
There are many other bottlenecks such as waiting for calls from DB, cache, etc. Python web speed is "fast enough" and usually not the biggest bottleneck
Development speed can be faster with Python and that is more critical
psyco for Python -> C compiling
Each video is hosted by a mini-cluster
A cluster of machines that serve the exact same video
Offers availability via backups
Served with Apache at first. Lasted about 3 months. High load, high context switching (whenever the operating system stops a thread from running in the CPU and puts another thread to run in its place).
Moved to lighttpd to move from single-process to single-thread architecture, open source. Allows pulling a large number of file descriptors efficiently.
CDN for most popular content
Less popular content go to regular Youtube servers
Youtube servers have to be optimized very differently vs. the CDNs. The CDN data mostly comes from memory. For Youtube servers, commodity hardware was used, with not too little or too much memory. Expensive hardware wasn't worth it. Memory had to be tweaked for random access and to avoid cache thrashing.
Thumbnails
Served them with Apache at first with a unified pool (same hosts that serve traffic also serve static thumbnails). Eventually moved to separate pools. High load, low performance
EXT3 partition => too many files in one directory (thumbnails) caused an issue where no more video uploads can be made
Switched from Apache to Squid (reverse proxy), far more efficient with CPU. However the performance would degrade (start at 300 TPS, then eventually down to 20 TPS)
However, there were still too many images. It would take 24-48 hours just to copy images from one machine to another machine. rsync would never finish because it would exhaust all the memory in the machine.
Moved onto Big Table to fix the issue, i.e. Google's BTFE.
Databases
Used MySQL because its fast if you know how to tune correctly
Stored metadata, i.e. username, password, vid descriptions, titles, messages
Started with one main DB, one backup
Did ok until more people started using Youtube
MySQL db was swapping, bringing down the site to a crawl. Turned out that early Linux kernel, it treats page cache more important than your application
As Youtube started scaling up, they did DB/SQL replication to spread reads across multiple machines.
Replication lag, too many read replicas. Users complaining about weird unreproducible bugs.
Split read replicas to video watching pools and the general pool. Stopgap measure - not a real fix.
DB RAID tweaking increased throughput by 20%. Linux sees 5 volumes instead of 1 logical volume, allowing to more aggresively schedule disk I/O. Same hardware.
Eventually did Database Partitions to spread writes AND read, partition by user. Much better cache locality => less I/O. Got rid of read replicas.