NoSQL
NoSQL is a broad category that includes any database that doesn’t use SQL as its primary data access language. These types of databases are also sometimes referred to as non-relational databases. Unlike in relational databases, data in a NoSQL database doesn’t have to conform to a pre-defined schema. NoSQL databases follow BASE consistency model.
Below are different types of NoSQL databases:
Document
A document database (also known as a document-oriented database or a document store) is a database that stores information in documents. They are general-purpose databases that serve a variety of use cases for both transactional and analytical applications.
Advantages
- Intuitive and flexible
- Easy horizontal scaling
- Schemaless
Disadvantages
- Schemaless
- Non-relational
Examples
Key-value
One of the simplest types of NoSQL databases, key-value databases save data as a group of key-value pairs made up of two data items each. They’re also sometimes referred to as a key-value store.
Advantages
- Simple and performant
- Highly scalable for high volumes of traffic
- Session management
- Optimized lookups
Disadvantages
- Basic CRUD
- Values can’t be filtered
- Lacks indexing and scanning capabilities
- Not optimized for complex queries
Examples
Graph
A graph database is a NoSQL database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data instead of tables or documents.
The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation.
Advantages
- Query speed
- Agile and flexible
- Explicit data representation
Disadvantages
- Complex
- No standardized query language
Use cases
- Fraud detection
- Recommendation engines
- Social networks
- Network mapping
Examples
Time series
A time-series database is a database optimized for time-stamped, or time series, data.
Advantages
- Fast insertion and retrieval
- Efficient data storage
Use cases
- IoT data
- Metrics analysis
- Application monitoring
- Understand financial trends
Examples
Wide column
Wide column databases, also known as wide column stores, are schema-agnostic. Data is stored in column families, rather than in rows and columns.
Advantages
- Highly scalable, can handle petabytes of data
- Ideal for real-time big data applications
Disadvantages
- Expensive
- Increased write time
Use cases
- Business analytics
- Attribute-based data storage
Examples
Multi-model
Multi-model databases combine different database models (i.e. relational, graph, key-value, document, etc.) into a single, integrated backend. This means they can accommodate various data types, indexes, queries, and store data in more than one model.
Advantages
- Flexibility
- Suitable for complex projects
- Data consistent
Disadvantages
- Complex
- Less mature
Examples
Abstract
History of NoSQL databases
- relational database
- persistence
- integration
- SQL
- transactions
- reporting
- impedance mismatch: single user interface is being splattered across lots and lots of tables
- 1990-200: relational dominance
- change when there’re lots of traffics from the internet
- SQL was designed to run as a single node system, it does not work well with little boxes
- big companies started to make their own databases:
- google → big table
- amazon → dynamo
- NoSQL term was historically only meant to be used as a twitter hastag
Definition of NoSQL
- we cannot defined a NoSQL
- we can list the common characteristics of NoSQL databases:
- non-relational
- open-source
- cluster-friendly
- 21st century web
- schema-less
Data models
document
- ex: mongodb, couchdb
key-value
- ex: redis
graph
- ex: neo4j
column-family
- ex: cassandra
aggregate-oriented databases: take a lot of stuff that’s scattered around and put them into bigger lumps
- key-value: value == aggregate
- document: document == agregate
- column-family
- advantage if you are using the same aggregate to push data back and forth into persistence
- disadvantage if you want to slice and dice your data in different ways
graph model: break things apart into even smaller units and let you play with those smaller units more carefully
All NoSQL are schemaless
NoSQL and consistency
- RDBMS == ACID
- NoSQL == BASE
- graph databases tend to follow ACID
- keep your transactions within a single aggregate
- two kinds of consistency:
- logical: run in a cluster or in a single machine
- replication: replicate data to all nodes, and this introduces a new type of consistency to deal with
- consistency is always a domain choice
- there are always some ways to deal with inconsistency, not always technical, e.g. a book hotel can have another room for double booking
- it’s the business that will dictate what’s important between consistency and availability
- safety vs liveness
When and why use NoSQL
- easier development
- large scale data
Future of databases: polyglot
Example:
- user sessions: redis
- financial data: RDBMS
- shopping cart: riak
- recommendations: neo4j
- product catalog: mongodb
- reporting: RDBMS
- analytics: cassandra
- user activity logs: cassandra
Opportunity leads to problems:
- decisions
- organizational change
- immaturity
- eventual consistency