Case Study: Social Media & Real-time

System Design: Twitter

Scaling a global real-time message delivery system.

Twitter's primary challenge isn't just storing tweets; it's distributing them instantly to millions of followers. Unlike Netflix, which is read-heavy, Twitter has a massive write load (posting tweets) and an even heavier read load (viewing timelines).

The Core Problem: Fan-out

How do you deliver one tweet to 100 million followers in under a second?

1. Baseline Requirements

Low Latency: Users expect to see tweets almost immediately after they are posted.
Availability: The feed must always be browseable, even if some features are degraded.
Persistence: Tweets must be stored reliably.

2. The Fan-out Architecture

Twitter uses two primary methods for generating timelines: Fan-out on Write and Fan-out on Read.

Push Model (Fan-out on Write)

When a user posts a tweet, the system immediately "pushes" that tweet into the pre-computed timelines of all their followers. These timelines are stored in Redis.

✅ Read is ultra-fast: Just fetch the Redis key.
❌ Write is slow for celebrities: Updating 100M keys takes too long.

Pull Model (Fan-out on Read)

Instead of pushing, the system only stores the tweet in a database. When a follower refreshes their feed, the system "pulls" tweets from all the people they follow and merges them on the fly.

✅ Write is ultra-fast: Just one DB insert.
❌ Read is slow: Heavy computation on every refresh.

3. The Hybrid Strategy

Twitter famously uses a Hybrid Approach to handle the "Celebrity Problem":

Normal Users: Use the Push Model. Their tweets are pushed to followers' caches.
Celebrities (e.g., Elon Musk): Use the Pull Model. Their tweets are not pushed. Instead, they are merged into the follower's feed only when the follower requests it.

// Logic: Celebrity Check

if (user.follower_count > 100000) {
    // Celebrity
    store_tweet_in_db(tweet);
    notify_search_index(tweet);
} else {
    // Normal User
    push_to_follower_timelines_in_redis(tweet);
}

4. Data Storage & Caching

Twitter relies heavily on Redis for timeline storage. Because the "Now" is more important than the "Past", only the most recent few thousand tweets for each user are kept in memory. Older tweets are pulled from disk-based storage when needed.

Key Tech Stack:

FlockDB: A specialized graph database for storing social graphs (who follows whom).
Gizzard: A sharding framework for distributed databases.
Redis (Timeline Cache): The backbone of the read-heavy timeline generation.

Connect & Discuss

Have questions about systems engineering, or found a bug in the code? Reach out!

Email Me LinkedIn GitHub

Feedback

This blog is a static site, but I'd love to hear your thoughts. You can discuss this post by sending me an email or reaching out on social media.

Send Feedback

← Series Intro Return Home