Architecture Decisions: Stateless Servers and Distributed Databases
Why app servers are stateless, how the database tier uses consistent hashing, SQL vs NoSQL sharding support, sessions, Cassandra's multi-master model, and where consistent hashing appears beyond databases.
Why Consistent Hashing Is Used Less Than You'd Expect
After the entire discussion of consistent hashing, here is a surprising fact: in modern production systems, consistent hashing is used primarily for databases and caches, not for application servers.
The reason is a fundamental architectural issue. So far, the servers described have been doing two things simultaneously: running application code (handling requests, processing logic) and storing data. These are called stateful servers.
The problem is that application servers and database servers have completely different hardware requirements:
| Component | Needs |
|---|---|
| Application server | Strong CPU, good RAM, fast network |
| Database server | CPU, RAM, large fast disk, network |
When you put both responsibilities on the same machine, you over-provision in every direction. The machine needs a powerful CPU for the application code AND large disk for the data. This is expensive and inflexible.
What "Stateful" Actually Means
A server is stateful whenever it stores data on the machine itself -- in any form. This includes:
- An embedded database running on the same machine as the application code
- An in-memory cache (data stored in process memory, lost on restart)
- Files written to local disk
- Session context held in application memory (a user's logged-in state, a partially completed workflow)
The defining question is: if this machine is replaced with a fresh one, does any data disappear? If yes, the server is stateful.
Application servers that hold nothing on the machine -- they receive a request, compute a response, and write any persistent data to an external database -- are stateless. A stateless server can be replaced, restarted, or multiplied with no coordination. Any server in the pool can handle any request.
Decoupled Architecture: App Servers + Database Servers
The standard modern architecture separates these concerns:
- Application servers run only code. They hold no persistent data. They are completely stateless.
- Database servers store only data. They run no business logic.
Once separated, the full architecture looks like this:
Client
↓
Application Load Balancer (round-robin / random)
↓
App Servers (stateless — any server can handle any request)
↓
Database Load Balancer (consistent hashing)
↓
Database Shards (each shard owns a portion of users)The application load balancer uses simple round-robin or random assignment. It does not care about data placement because the app servers hold no data.
The database load balancer uses consistent hashing. It does care about data placement -- a specific user's data must always go to the same shard.
From the application server's perspective, there is no visible complexity. It sends a query and gets a response. The database load balancer, operating transparently in between, routes that query to the correct shard. The app server sees one giant unified database.
This separation allows each tier to be scaled and optimised independently. You need more compute? Add app servers. You need more storage? Add database shards. The two concerns do not constrain each other.
SQL vs NoSQL: Who Handles Sharding?
SQL databases (PostgreSQL, MySQL, Oracle DB, IBM DB2, SQLite) do not natively support horizontal sharding. They are designed around a single-node model. If your data needs to be split across servers, the database itself will not manage that for you. Your application, or a separate proxy layer, has to do it.
Options for sharding a SQL workload:
- Third-party extensions (e.g., Citus for PostgreSQL): community-maintained, not officially supported by the database vendor. They work, but they introduce operational complexity, potential bugs, and version compatibility concerns.
- Application-level sharding: the application itself determines which database instance to connect to based on the sharding key. You build and maintain the routing logic.
- Managed cloud databases (Amazon RDS, Google Cloud SQL, Azure Database): the cloud provider wraps the SQL engine with their own sharding infrastructure. You configure a shard key; the provider handles the rest. This costs more but removes the engineering burden.
NoSQL databases (MongoDB, Redis, Cassandra) were built from the start to operate in distributed environments. MongoDB has built-in sharding -- you configure a shard key, and MongoDB handles the ring, the virtual nodes, and the routing internally. Redis Cluster uses a fixed 16,384-slot hash ring. These systems expose a single connection endpoint to the application, and the sharding is entirely invisible.
PostgreSQL Sharding in Practice
Self-managed on EC2: You provision EC2 instances, install PostgreSQL manually, and handle the load balancer and routing layer yourself. Third-party extensions add sharding but are not officially supported -- they may have bugs, performance edge cases, or fall behind on major version upgrades.
Managed service (Amazon RDS): AWS RDS includes Amazon-written extensions that add sharding capabilities. You configure a sharding key through their management interface. You interact with it through a standard PostgreSQL connection string. You do not manage the sharding logic yourself.
The trade-off is cost. A managed RDS instance costs more per unit of compute and storage. What you are paying for is the elimination of DevOps and infrastructure engineering work. At most companies, engineer time costs more than server time -- which is why managed databases are the default choice unless you are operating at a scale where the cost difference becomes significant.
Sessions in a Stateless Architecture
When app servers are stateless, they hold no per-user context. Where does session data live?
It goes into the database (or a dedicated cache such as Redis), not into the app server.
The flow:
- User logs in → app server validates credentials → writes session token and context to the database
- Every subsequent request carries the session token → app server reads session data from the database → processes the request → returns a response
- The app server itself retains nothing between requests
Because session data is stored in the database and routed via consistent hashing, the correct shard always holds the right session. Any app server in the pool can handle any request from any user.
Every Cluster Has a Load Balancer
Any time you operate a cluster of servers -- database servers, cache servers, application servers -- something must sit in front of them to route requests to the right node. Without it, the application server would need to know the internal topology of the cluster: which shard holds which data, what the current server list is, how to handle a crashed node. The application would be doing the job of a load balancer, which means cluster topology changes require application code changes.
The solution is to always place a load balancer in front of every cluster. The application server sees a single endpoint. The load balancer knows the cluster topology and handles routing transparently.
Exception: Cassandra. Cassandra is a distributed database that does not use an explicit, separate load balancer. Instead, every Cassandra node acts simultaneously as a database node and a query coordinator. When a request arrives at any node, that node uses the gossip protocol to determine which node holds the relevant data, forwards the query there, collects the result, and returns it to the caller. Every node in the cluster can accept requests.
This is called multi-master replication: there is no single authoritative master node. All nodes are peers. This differs from MongoDB, which uses master-slave replication where one primary node handles writes and replica nodes handle reads.
Stateless Architecture in Practice: ChatGPT as an Example
ChatGPT is a useful example of a stateless server architecture at scale.
When you type a message, the entire conversation history is bundled into the API request and sent to the model. The server that processes it receives the full context in the request itself -- it does not need to look up any previous conversation from local storage. The response is returned and the server forgets everything. The next message in the same conversation can go to a completely different server in the pool. No coordination is needed.
The conversation history you see in the UI is stored in a database, not on any application server. When you open a past chat, the front end fetches the history from the database and includes it in the next request. The model server sees a fresh context on every call.
This is what makes it possible to run thousands of model servers behind a load balancer with simple round-robin routing. Because no server holds any state, any server can handle any request.
The context window limit in LLMs is a direct consequence of this architecture. Because the entire conversation is sent with every request, larger contexts mean larger requests. There is a practical ceiling on how much context can be processed in a single forward pass.
Transformer-based models differ architecturally from recurrent models like LSTMs. An LSTM maintains a hidden state across time steps -- it is stateful. A transformer reads the entire input sequence in one shot and predicts the output. It has no internal memory between calls. Each call is independent. Transformers are stateless by design, which is what makes them horizontally scalable.
Why Consistent Hashing Is Worth Knowing
If NoSQL handles consistent hashing internally, why study it?
Because consistent hashing appears in more places than just databases:
Caching systems. Redis Cluster, Memcached, and similar distributed caches use consistent hashing to decide which cache node stores which key. Understanding how this works tells you why adding or removing a cache node does not cause a total cache flush -- it only invalidates a fraction of the keys.
Notification systems. A system that maintains a long-lived WebSocket connection per user (for push notifications) needs a way to guarantee that all messages for user X always go to the same server, because that server holds X's open socket. Consistent hashing solves this: hash the user ID, route to the correct server. Every load balancer independently resolves the same mapping without coordination.
Working on databases. If you end up building infrastructure, working at a database company, or contributing to distributed systems internals, consistent hashing is a foundational primitive. It is not an advanced topic at that level; it is assumed knowledge.
The larger point is that consistent hashing solves a whole class of problems: any situation where a specific identity (user ID, cache key, connection ID) must reliably map to a specific server, without centralised coordination, and with minimal disruption when the server list changes.
Practice what you just read.
Keep reading
Enjoyed this? Get more like it.
Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.