DDIA: Chapter 5

DDIA: Chapter 5

Part II Interlude

Distributed Data

  • scalability: spread data volume and/or read/write load to more than a single machine
  • fault tolerance/high availability: multiple machines provide redundancy
  • latency: servers at various locations for geographically distributed access

Scaling to Higher Load

  • vertical scaling (scaling up): buy a more powerful machine
  • shared-memory architecture: fast interconnect allows any CPU to access any part of memory/disk
    • cons: cost grows faster than linearly; limited fault tolerance
  • shared-disk architecture: independent CPUs/RAM but data is shared between machines via fast network
    • cons: contention and overhead of locking limit scalability
  • shared-nothing architecture (horizontal scaling, scaling out): each machine is an independent node , coordination between nodes is done using software/conventional network, no special hardware required
    • replication: a copy of the same data is on several nodes - provides redundancy, can increase performance
    • partitioning (sharding): splitting data across nodes

Chapter 5

Replication

  • replication: keeping a copy of the same data on multiple machines
    • to keep data geographically close
    • to increase availability in case of failure
    • to scale out serving read-based requests

Leaders and Followers

  • replica: each node that stores a copy of the databse
  • leader-based replication(active/passive, master-slave replication):
    • most common
    • one replica is designated the leader (master,primary): clients must send write requests to the leader which writes locally
    • other replicas are followers (read replicas, slaves, secondaries): data changes are sent from leader as part of a replication log (change stream) and u pdates
    • built-in feature of many DBs (PostgreSQL, MySQL, MongoDB, etc.) and some message brokers (Kafka, RabbitMQ)

Synchronous vs. Asynchronous Replication

  • synchronous: leader waits until each follower reports the write before making the write visible to client
    • follower is guaranteed to have up-to-date-data
    • huge disadvantage if follower doesn’t respond/write cannot be processed as iti s blocking
  • semi-synchronous: if the synchronous follower is unavailable, an asynchronous follower is m ade synchronous
  • asynchronous: leader sends message but doesn’tw ait for response
    • weakens durability but is widely used
    • if leader fails, non-replicated writes are lost
    • can continue processing writes even if followers have fallen behind
  • research into performant synchronous replication (e.g. chain replication)

Setting Up New Followers

  • copying data files is inconsistent, locking isn’t high availability
  • can take snapshots of leader and sync differences since

Handling Node Outages

  • follower failure - if a follower crashes it can catch-up from leader based on logs
  • leader failure - failover: a follower must be promoted, clients need to be rerouted for writes, followers need to be rerouted for data changes
    • can happen manually
    • problematic if an old leader rejoins (what to do with out-of-sync writes? discards can cause stampeding inconsistency)
    • split brain: two nodes both believe they are the leader
    • timeout should be chosen with care

Implementation of Replication Logs

  • statement-based replication: leader can log every write request (statement) => need care around nondeterministic functions (now()), dependencies on existing data/ordering, statements thath ave side effects
  • write-ahead log (WAL) shipping: log can be used to build replicas => closely coupled tos torage engine/format making changes difficult
  • logical (row-based) log replication: use different log formats for replication and storage engine (logical log, rather than physical data representation)
    • better for backwards compatibility, parsing in external applications
  • trigger-based replication: can register custom application code that i s automatically executed (triggers, stored procedures) - has greater overhead

Problems with Replication Lag

  • read-scaling architecture
  • eventual consistency: followers eventually catchu up to consistency withleader

Reading Your Own Writes

  • read-after-write-consistency (read-your-writes-consistency): readers see updates they submitted themselves
    • can mitigate by reading user’s own profile from reader but not good when many things are editable
    • can approach with timestamp-based methods
    • problematic across datacenters

Monotonic Reads

  • asynchronous can see data moving backwards in time when reading from a follower where data has propagated and not propagated
  • can mitigate by ensuring a user always makes reads from the same replica (have to think about failure)

Consistent Prefix reads

  • causality/dependency can be violated when writing from multiple clients
  • can mitigate by writing causal events to the same partition, but efficiency cost

Solutions for Replication Lag

  • transactions can provide stronger guarantees

Multi-Leader Replication

  • multi-leader configuration (master-master, active/active replication): each leader is a follower to other leaders (e.g. multi-datacenter, clients with offline operation, collaborative editing)
    • performance: writes can be processed in local datacenters
    • tolerance of datacenter outages
    • tolerance of network problems
    • disadvantage that data may be concurrently modified, often retrofitted solutions

Handling Write Conflicts

  • conflict resolution required when concurrent wwrites are allowed
  • can make synchronous but would lose multi-leader advantage
  • can ensure writes for a record go througha particularr leader
  • can give writes, replicas unique IDs
  • can try to merge values or use a data structure that preserves all information (resolve conflict later)
  • can resolve conflict on write or on read
  • can use special data structures/algorithms (conflict-free replicated datatypes (CRDTs) (automatic set/map/list/etc. resolution), mergeable persistent data structures (similar to Git), operational transformation for collaborative editing)

Multi-Leader Replication Topologies

  • circular topology: each leader is dependent on subsequent leader
  • star topology: each leader depends on centralized leader
  • all-to-all topology: all leaders communicate with each other
  • circular and star difficult in case of node failure
  • timestamps are insufficient for conflict resolution (not sufficiently in sync)

Leaderless Replication

  • e.g. Dynamo, Riak, Cassandra, Voldemort
  • client or coodrinatorn ode sends writes to several replicas
  • consesus-driven reads

Writing to the Database When a Node Is Down

  • just write to available nodes
  • read will be driven by consensus/quorum
  • eventual consistency:
    • read repair: detect stale responses during read and update
    • anti-entropy process: background process actively looks for differences in data between replicas
  • quorums can be tuned:
    • n replicas
    • w nodes where writes are confirmed
    • r nodes queried per read
    • can set w = r = (n + 1) / 2 with odd n

Limitations of Quorum Consistency_

  • quorums are not necessarily majorities => the set of nodes used by read and write operations need to overlap in at least one node
  • smaller r and w means stale valuesa rem ore likely
  • monitoring stalenessi s difficult

Sloppy Quorums and Hinted Handoff

  • sloppy quorum: accept writes to nodes that are reachable even if they aren’t among the n nodes on which the value usually lives (e.g. due to network interruptions)
  • hinted handoff: writes that an available node accepted are sent to appropriate “home” nodes
  • multi-datacenter operation: each write from a client is sent to all replicas regardless of datacenter, higher-latency writes to other datacenters can be configured to be asynchronous

Detecting Concurrent Writes

  • concurrent writes can lead to inconsistencies in quorum
  • overwites would make nodes permanently inconsistent
  • last write wins: discarding older writes, costs durability => safe only when a key is written once and immutable (e.g. use UUID as key)
  • concurrency is difficult to define => when two operations are unaware of each other
  • can read, version, merge (see grocery cart single replica example p. 189)
  • version vector: version number per replica in addition to per key