DDIA: Chapter 5

Part II Interlude

Distributed Data

scalability: spread data volume and/or read/write load to more than a single machine
fault tolerance/high availability: multiple machines provide redundancy
latency: servers at various locations for geographically distributed access

Scaling to Higher Load

vertical scaling (scaling up): buy a more powerful machine
shared-memory architecture: fast interconnect allows any CPU to access any part of memory/disk
- cons: cost grows faster than linearly; limited fault tolerance
shared-disk architecture: independent CPUs/RAM but data is shared between machines via fast network
- cons: contention and overhead of locking limit scalability
shared-nothing architecture (horizontal scaling, scaling out): each machine is an independent node , coordination between nodes is done using software/conventional network, no special hardware required
- replication: a copy of the same data is on several nodes - provides redundancy, can increase performance
- partitioning (sharding): splitting data across nodes

Chapter 5

Replication

replication: keeping a copy of the same data on multiple machines
- to keep data geographically close
- to increase availability in case of failure
- to scale out serving read-based requests

Leaders and Followers

replica: each node that stores a copy of the databse
leader-based replication(active/passive, master-slave replication):
- most common
- one replica is designated the leader (master,primary): clients must send write requests to the leader which writes locally
- other replicas are followers (read replicas, slaves, secondaries): data changes are sent from leader as part of a replication log (change stream) and u pdates
- built-in feature of many DBs (PostgreSQL, MySQL, MongoDB, etc.) and some message brokers (Kafka, RabbitMQ)

Synchronous vs. Asynchronous Replication

synchronous: leader waits until each follower reports the write before making the write visible to client
- follower is guaranteed to have up-to-date-data
- huge disadvantage if follower doesn’t respond/write cannot be processed as iti s blocking
semi-synchronous: if the synchronous follower is unavailable, an asynchronous follower is m ade synchronous
asynchronous: leader sends message but doesn’tw ait for response
- weakens durability but is widely used
- if leader fails, non-replicated writes are lost
- can continue processing writes even if followers have fallen behind
research into performant synchronous replication (e.g. chain replication)

Setting Up New Followers

copying data files is inconsistent, locking isn’t high availability
can take snapshots of leader and sync differences since

Handling Node Outages

follower failure - if a follower crashes it can catch-up from leader based on logs
leader failure - failover: a follower must be promoted, clients need to be rerouted for writes, followers need to be rerouted for data changes
- can happen manually
- problematic if an old leader rejoins (what to do with out-of-sync writes? discards can cause stampeding inconsistency)
- split brain: two nodes both believe they are the leader
- timeout should be chosen with care

Implementation of Replication Logs

statement-based replication: leader can log every write request (statement) => need care around nondeterministic functions (now()), dependencies on existing data/ordering, statements thath ave side effects
write-ahead log (WAL) shipping: log can be used to build replicas => closely coupled tos torage engine/format making changes difficult
logical (row-based) log replication: use different log formats for replication and storage engine (logical log, rather than physical data representation)
- better for backwards compatibility, parsing in external applications
trigger-based replication: can register custom application code that i s automatically executed (triggers, stored procedures) - has greater overhead

Problems with Replication Lag

read-scaling architecture
eventual consistency: followers eventually catchu up to consistency withleader

Reading Your Own Writes

read-after-write-consistency (read-your-writes-consistency): readers see updates they submitted themselves
- can mitigate by reading user’s own profile from reader but not good when many things are editable
- can approach with timestamp-based methods
- problematic across datacenters

Monotonic Reads

asynchronous can see data moving backwards in time when reading from a follower where data has propagated and not propagated
can mitigate by ensuring a user always makes reads from the same replica (have to think about failure)

Consistent Prefix reads

causality/dependency can be violated when writing from multiple clients
can mitigate by writing causal events to the same partition, but efficiency cost

Solutions for Replication Lag

transactions can provide stronger guarantees

Multi-Leader Replication

multi-leader configuration (master-master, active/active replication): each leader is a follower to other leaders (e.g. multi-datacenter, clients with offline operation, collaborative editing)
- performance: writes can be processed in local datacenters
- tolerance of datacenter outages
- tolerance of network problems
- disadvantage that data may be concurrently modified, often retrofitted solutions

Handling Write Conflicts

conflict resolution required when concurrent wwrites are allowed
can make synchronous but would lose multi-leader advantage
can ensure writes for a record go througha particularr leader
can give writes, replicas unique IDs
can try to merge values or use a data structure that preserves all information (resolve conflict later)
can resolve conflict on write or on read
can use special data structures/algorithms (conflict-free replicated datatypes (CRDTs) (automatic set/map/list/etc. resolution), mergeable persistent data structures (similar to Git), operational transformation for collaborative editing)

Multi-Leader Replication Topologies

circular topology: each leader is dependent on subsequent leader
star topology: each leader depends on centralized leader
all-to-all topology: all leaders communicate with each other
circular and star difficult in case of node failure
timestamps are insufficient for conflict resolution (not sufficiently in sync)

Leaderless Replication

e.g. Dynamo, Riak, Cassandra, Voldemort
client or coodrinatorn ode sends writes to several replicas
consesus-driven reads

Writing to the Database When a Node Is Down

just write to available nodes
read will be driven by consensus/quorum
eventual consistency:
- read repair: detect stale responses during read and update
- anti-entropy process: background process actively looks for differences in data between replicas
quorums can be tuned:
- n replicas
- w nodes where writes are confirmed
- r nodes queried per read
- can set w = r = (n + 1) / 2 with odd n

Limitations of Quorum Consistency_

quorums are not necessarily majorities => the set of nodes used by read and write operations need to overlap in at least one node
smaller r and w means stale valuesa rem ore likely
monitoring stalenessi s difficult

Sloppy Quorums and Hinted Handoff

sloppy quorum: accept writes to nodes that are reachable even if they aren’t among the n nodes on which the value usually lives (e.g. due to network interruptions)
hinted handoff: writes that an available node accepted are sent to appropriate “home” nodes
multi-datacenter operation: each write from a client is sent to all replicas regardless of datacenter, higher-latency writes to other datacenters can be configured to be asynchronous

Detecting Concurrent Writes

concurrent writes can lead to inconsistencies in quorum
overwites would make nodes permanently inconsistent
last write wins: discarding older writes, costs durability => safe only when a key is written once and immutable (e.g. use UUID as key)
concurrency is difficult to define => when two operations are unaware of each other
can read, version, merge (see grocery cart single replica example p. 189)
version vector: version number per replica in addition to per key