DDIA: Chapter 2

Data Models and Query Languages

Relational Model
- e.g. SQL
- data is organized into relations (unordered tuples = rows)
- translation is needed between application objects and DB tables
- some support XML/JSON/etc. in cells, but, difficult to query
NoSQL
- “Not Only SQL” - anything non-relational (e.g. document, graph)
- need for scale (large datasets, high write throughput), special query operations
- document models like JSON have better locality (one query vs. multiple), and store nested records
  - e.g. MongoDB
- document use case: data is in self-contained documents, relationships between documents are rare
- graph use case: anything is potentially related to everything

IDs vs. plain-text strings helps with consistency and ease of updating/localization/search
many-to-one is less suited to the document model (since joins are required)
- e.g. lots of people may live in one location - finding who lives in one location when the document model object is based on a person is difficult

Which data model leads to simpler application code?
- tree of one-to-many relationships, entire tree is loaded at once => document
- many-to-many => relational, graph
- depends on the kinds of relationships that exist between data items
most document databases do not enforce schema
- schema-on-read: structure is implicit, interpreted on read
- schema-on-write: like relational, schema is explicit, written data conforms
schema changes/updates can be slow (every row needs to be rewritten)
documents usually have storage locality (performance advantage) but only when updating large parts of the document (entire object needs to be rewritten)

declarative languages (in data) tend to be easier to write/interpret since properties of data are being specified
declarative languages (in data) are more amenable to parallelization

vertex:
- unique identifier
- incoming/outgoing edges
- collection of properties (KV pairs)
edge:
- unique identifier
- vertex start (tail vertex), vertex end (head vertex)
- label describing relationship
- collection of properties (KV pairs)
efficient to traverse
amenable to chains of relationships (e.g. living in Country -> State -> City), recursion

graph can be mapped to a relational database, but have to know what queries to perform beforehand