NoSQL


NoSQL is a broad category that includes any database that doesn’t use SQL as its primary data access language. These types of databases are also sometimes referred to as non-relational databases. Unlike in relational databases, data in a NoSQL database doesn’t have to conform to a pre-defined schema. NoSQL databases follow BASE consistency model.

Below are different types of NoSQL databases:

Document

A document database (also known as a document-oriented database or a document store) is a database that stores information in documents. They are general-purpose databases that serve a variety of use cases for both transactional and analytical applications.

Advantages

  • Intuitive and flexible
  • Easy horizontal scaling
  • Schemaless

Disadvantages

  • Schemaless
  • Non-relational

Examples

Key-value

One of the simplest types of NoSQL databases, key-value databases save data as a group of key-value pairs made up of two data items each. They’re also sometimes referred to as a key-value store.

Advantages

  • Simple and performant
  • Highly scalable for high volumes of traffic
  • Session management
  • Optimized lookups

Disadvantages

  • Basic CRUD
  • Values can’t be filtered
  • Lacks indexing and scanning capabilities
  • Not optimized for complex queries

Examples

Graph

A graph database is a NoSQL database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data instead of tables or documents.

The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation.

Advantages

  • Query speed
  • Agile and flexible
  • Explicit data representation

Disadvantages

  • Complex
  • No standardized query language

Use cases

  • Fraud detection
  • Recommendation engines
  • Social networks
  • Network mapping

Examples

Time series

A time-series database is a database optimized for time-stamped, or time series, data.

Advantages

  • Fast insertion and retrieval
  • Efficient data storage

Use cases

  • IoT data
  • Metrics analysis
  • Application monitoring
  • Understand financial trends

Examples

Wide column

Wide column databases, also known as wide column stores, are schema-agnostic. Data is stored in column families, rather than in rows and columns.

Advantages

  • Highly scalable, can handle petabytes of data
  • Ideal for real-time big data applications

Disadvantages

  • Expensive
  • Increased write time

Use cases

  • Business analytics
  • Attribute-based data storage

Examples

Multi-model

Multi-model databases combine different database models (i.e. relational, graph, key-value, document, etc.) into a single, integrated backend. This means they can accommodate various data types, indexes, queries, and store data in more than one model.

Advantages

  • Flexibility
  • Suitable for complex projects
  • Data consistent

Disadvantages

  • Complex
  • Less mature

Examples


Abstract

History of NoSQL databases

  • relational database
    • persistence
    • integration
    • SQL
    • transactions
    • reporting
  • impedance mismatch: single user interface is being splattered across lots and lots of tables
  • 1990-200: relational dominance
  • change when there’re lots of traffics from the internet
  • SQL was designed to run as a single node system, it does not work well with little boxes
  • big companies started to make their own databases:
    • google big table
    • amazon dynamo
  • NoSQL term was historically only meant to be used as a twitter hastag

Definition of NoSQL

  • we cannot defined a NoSQL
  • we can list the common characteristics of NoSQL databases:
    • non-relational
    • open-source
    • cluster-friendly
    • 21st century web
    • schema-less

Data models

  • document

  • key-value

  • graph

  • column-family

  • aggregate-oriented databases: take a lot of stuff that’s scattered around and put them into bigger lumps

    • key-value: value == aggregate
    • document: document == agregate
    • column-family
    • advantage if you are using the same aggregate to push data back and forth into persistence
    • disadvantage if you want to slice and dice your data in different ways
  • graph model: break things apart into even smaller units and let you play with those smaller units more carefully

  • All NoSQL are schemaless

NoSQL and consistency

  • RDBMS == ACID
  • NoSQL == BASE
  • graph databases tend to follow ACID
  • keep your transactions within a single aggregate
  • two kinds of consistency:
    • logical: run in a cluster or in a single machine
    • replication: replicate data to all nodes, and this introduces a new type of consistency to deal with
  • consistency is always a domain choice
    • there are always some ways to deal with inconsistency, not always technical, e.g. a book hotel can have another room for double booking
    • it’s the business that will dictate what’s important between consistency and availability
    • safety vs liveness

When and why use NoSQL

  • easier development
  • large scale data

Future of databases: polyglot

Example:

  • user sessions: redis
  • financial data: RDBMS
  • shopping cart: riak
  • recommendations: neo4j
  • product catalog: mongodb
  • reporting: RDBMS
  • analytics: cassandra
  • user activity logs: cassandra

Opportunity leads to problems:

  • decisions
  • organizational change
  • immaturity
  • eventual consistency