Data consistency in distributed system

In this post, we give a short tutorial on consistency levels - explaining what they do, and how they work. Much of the existing literature and online discussion of consistency levels are done in the context of multi-processor or distributed systems that operate on a single data item at a time, without the concept of a “transaction”. In this post we give some direction for how to think about consistency levels in the context of ACID-compliant database systems.

Data Consistency Levels – Linearizable

The first SELECT statement reads the value of 50, while the second SELECT reads the value of 10 since in between the two read operations a write operation was executed.

Linearizability means that modifications happen instantaneously, and once a registry value is written, any subsequent read operation will find the very same value as long as the registry will not undergo any modification.

What happens if you don’t have Linearizability?

enter image description here

This time, we don’t have a single registry or a single source of truth. Our system uses asynchronous database replication, and we have a Primary node that takes both reads and writes and a Follower node used for read operations only.

Because replication happens asynchronously, there’s a lag between the Primary node row modification and the time when the Follower applies the same change.

One database connection changes the account balance from 50 to 10 and commits the transaction. Right after, a second transaction reads from the Follower node, but since replication did not apply the balance modification, the value of 50 is read.

Therefore, this system is not linearizable since changes don’t appear to happen instantaneously. In order to make this system linearizable, we need to use synchronous replication, and the Primary node UPDATE operation will not complete until the Follower node also applies the same modification.

Eventual Consistency

  1. Your data is replicated on multiple servers
  2. Your clients can access any of the servers to retrieve the data
  3. Someone writes a piece of data to one of the servers, but it wasn’t yet copied to the rest
  4. A client accesses the server with the data, and gets the most up-to-date copy
  5. A different client (or even the same client) accesses a different server (one which didn’t get the new copy yet), and gets the old copy

Basically, because it takes time to replicate the data across multiple servers, requests to read the data might go to a server with a new copy, and then go to a server with an old copy. The term “eventual” means that eventually the data will be replicated to all the servers, and thus they will all have the up-to-date copy.

Eventual consistency is a must if you want low latency reads, since the responding server must return its own copy of the data, and doesn’t have time to consult other servers and reach a mutual agreement on the content of the data.

When you design your system to be eventually consistent, you need to be aware of problems that can arise from different nodes having different views of the world, and take steps to compensate. However, this compensation can lead you down a rat hole if you’re not careful; it’s easy to spend exorbitant amounts of effort to prepare for the case where the airliner carrying the not-yet-cleared checks crashes and burns.

On the other hand, with a transactionally-consistent system, you have to be ready for the failure of any node (or communications channel) to mean the failure of the entire system (or at least of every attempted transaction).

Eventual consistency guarantees consistency throughout the system, but not at all times. There is an inconsistency window, where a node might not have the latest value, but will still return a valid response when queried, even if that response will not be accurate. Cassandra has a ring system where your data is split up into different nodes:

enter image description here

Any of those nodes can act as the primary interface point for your application. So there is no single point of failure because any of those nodes can serve as your primary API point. But there is a trade-off here. Because any node can be primary, that data needs to be replicated amongst all of these nodes in order to stay up to date. So all of the other nodes needs to know what is where at all times and that means that as a trade-off for this architecture, we have eventual consistency. Because it takes time for that data to propagate throughout the ring, through every node in your system. So, as the data is written, it might be a little bit of time before you can actually read that data back you just wrote. Maybe data is written to one node, but you are reading it from a different node and that written data have not made it to that other node yet.

Let’s say you back up your photos on your phone to the cloud every Sunday. If you check your photos on Friday on your cloud, you are not going to see the photos that were taken between Monday-Friday. You are still getting a response but not an updated response but if you check your cloud on Sunday night you will see all of your photos. So your data across phone and cloud services eventually reach consistency.

To sum up, in any system where we have more than one machine processing read requests, and while processing a read request the machine can’t contact other machines – the best we can hope for is eventual consistency. In other words, if high scale and very low latency are both critical, you’ll have to build your system so that it handles eventual consistency properly.

Causal Consistency

  • Stock ticker applications.
  • Weather tracking apps.
  • Mostly, any status tracking apps should be bounded in staleness.
  • May be, online gaming apps.

Causal consistency enforces ordering of only related writes across units of executions.
What does related Write mean?
Say, if an unit of execution reads a variable x and depending on its value, updates another variable y, we say, Write of y happens after (causally dependent on) Read of x. Causal consistency guarantees that all units of execution observes new value of y only after observing the related value of x (dependency).

The following example is causally consistent:

Figure 12: Causal Consistency

Here, unlike the previous example, P2 does not read x first, it simply writes
y = 10 without depending on x. Essentially y is not causally related to x. Hence other units of execution can read x and y in any order and all such cases are causally consistent.

  • Only related writes are ordered in the order of their occurrence across units of execution. Unrelated writes can be placed in any order. Hence, there is no notion of global ordering in a causally consistent system.
  • No real time constraints imposed.
  • As mentioned, order in which variables are observed is more important than the real value observed at the time of operation.
  • Different units of operation might observe different causally consistent sequences at the same time.
  • Causal order is transitive: A happens before BB happens before C means A happens before C.

Quorum

Quorum is required in a distributed environment where you are running a cluster of machines and anyone of these machines can accept a write/modify request and update the data. Under such scenarios Quorum is used to identify the leader that will accept the writes or determine which node can accept write/modify requests for a given range of keys.

Let’s consider a scenario where you have 3 master server accepting writes, in that case if you want to update the data, can we just match the version on one of the masters and assume it is safe to update?

No, because at the same moment some other write request to other master server can also assume the same and hence you will end up with different state of data in different machines.

In this scenario, you need quorum to identify the leader that will accept writes for given range of data and then you can use versioning (optimistic locking) to ensure data is consistent across all machines and serialized.

Versioning, however is helpful when you have one master accepting the writes and multiple users might want try to update the data, using versioning here can help you to achieve Optimistic Locking. This is generally helpful when chances of locking are low.

The idea behind implementing consistency with a quorum is to maintain consistency in one group (that contains the majority of replicas) and forcing, by construction, that reads and writes cannot violate the consistency within that group.

This is done by only allowing updates within a group of replicas that contains the majority of replicas, so that at least one node witnesses all the updates; this is related to the second condition you’ve mentioned. In more details, consider a scenario for writing:

  • To complete a write, a client must successfully place the updated data item on more than N/2 (i.e. a majority of) replicas. (The updated data item will be assigned a new version number that is obtained by incrementing its current version number.)
  • Since the write quorum engages a majority of the replicas, two distinct write operations cannot succeed at the same time. (There do not exist two distinct groups that contain majorities of replicas.)
  • Therefore, all write operations are serialized and therefore consistency is guaranteed. (At least one node has to be included in two consecutive write operations, while that node will have the highest current version number.)

Note that consistency overall, has to be ensured with respect to both writing and reading. (As an extreme example, a system that does not perform reads will always be consistent.) To that end, mechanisms for distinguishing newer from older values is needed (e.g. a simple version number, as mentioned before) for achieving consistent read operations. This is related to the first condition you’ve mentioned: in turn, reading can be achieved with N + x - Nw replicas (where Nw + Nr = N + x, since Nw + Nr > N). Thus the intersection of the read quorum (of Nr nodes) and the write quorum (of Nw nodes) cannot be empty, so reads cannot overlap with writes, i.e. each read will have to consult at least one replica with the current data item (determined by it’s version number).

Note that to control concurrency, mechanisms such as two-phase locking can be used — for initial query from a set replicas, a reader can use read locks and a writer can use write locks. The locks are released after the operation completes or aborts.

Data Consistency Levels Tradeoffs

Levels:

ConsistencyEfficiency
LinearizableHighestLowest
Eventual ConsistencyLowestHighest
Causal ConsistencyHigher than eventual Consistency but lower than linearizable Higher than Linearizable but lower than eventual consistency
QuorumConfigurableConfigurable
Data Consistency Levels Tradeoffs

Transaction Isolation Levels –

Read Uncommitted Data

Read Uncommitted is the lowest isolation level. At this level, makes sure no transaction can update a database row if another transaction has already updated it and not committed. This protects against lost updates, but won’t stand in a way of dirty reads

Read Committed

This isolation level does not allow any other transaction to write or read a row to which another transaction has written to but not yet committed. Thus it does not allows dirty read. The transaction holds a read or write lock on the current row, and thus prevents other transactions from reading, updating, or deleting it.

Conclusion

At the end of this article, those are some resources for reading and getting more knowledge
Practical Guide to SQL Transaction Isolation
Transaction Isolation Levels in DBMS
Understanding isolation levels

Leave a Reply

Your email address will not be published. Required fields are marked *