DDIA: Chapter 2
Data Models and Query Languages
- Data models are important!
Relational Model Versus Document Model
- Relational Model
- e.g. SQL
- data is organized into relations (unordered tuples = rows)
- translation is needed between application objects and DB tables
- some support XML/JSON/etc. in cells, but, difficult to query
- NoSQL
- “Not Only SQL” - anything non-relational (e.g. document, graph)
- need for scale (large datasets, high write throughput), special query operations
- document models like JSON have better locality (one query vs. multiple), and store nested records
- document use case: data is in self-contained documents, relationships between documents are rare
- graph use case: anything is potentially related to everything
Many-to-One and Many-to-Many Relationships
- IDs vs. plain-text strings helps with consistency and ease of updating/localization/search
- many-to-one is less suited to the document model (since joins are required)
- e.g. lots of people may live in one location - finding who lives in one location when the document model object is based on a person is difficult
Relational Versus Document Databases Today
- Which data model leads to simpler application code?
- tree of one-to-many relationships, entire tree is loaded at once => document
- many-to-many => relational, graph
- depends on the kinds of relationships that exist between data items
- most document databases do not enforce schema
- schema-on-read: structure is implicit, interpreted on read
- schema-on-write: like relational, schema is explicit, written data conforms
- schema changes/updates can be slow (every row needs to be rewritten)
- documents usually have storage locality (performance advantage) but only when updating large parts of the document (entire object needs to be rewritten)
Query Languages for Data
- declarative languages (in data) tend to be easier to write/interpret since properties of data are being specified
- declarative languages (in data) are more amenable to parallelization
MapReduce Querying
- in-between declarative and imperative
- based on
map
(collect
) and reduce
(fold
or inject
) type operations
- must be pure functions (can only use input data, no DB additional queries)
- amenable to distributed execution
Graph-Like Data Models
- good for many-to-many relationships
Property Graphs
- vertex:
- unique identifier
- incoming/outgoing edges
- collection of properties (KV pairs)
- edge:
- unique identifier
- vertex start (tail vertex), vertex end (head vertex)
- label describing relationship
- collection of properties (KV pairs)
- efficient to traverse
- amenable to chains of relationships (e.g. living in Country -> State -> City), recursion
Graph Queries in SQL
- graph can be mapped to a relational database, but have to know what queries to perform beforehand
Triple-Stores and SPARQL
- modeled as (subject, predicate, object)