A quick look at database types, AWS database services and when to use them

This article will provide an introduction to database types, their characteristics and the options available on AWS.

The type of databases described are Relational, NoSQL and In-Memory databases.

Relational Databases

Relational databases are based on the relational data model which organises data into tables consisting of columns and rows. Each row is identified by a unique key and the language used for querying the data is SQL.

Relational databases are mostly used for:

Highly Structured Data: relational databases are best for cases with highly structured data.
Complex Joins Across Tables: support for joining multiple tables allows for better data normalisation.

On the contrary, relational databases are not a good fit when the requirements are:

Flexibility: the requirement for structure limits the flexibility of relational databases. They are not suited for storing loosely defined data such as JSON documents.
Performance: strict type checking and support for columns constraints can affect performance. Highly distributed applications that require high throughput are not a good choice.

The AWS services available as of today for relational databases are:

Amazon Aurora: is a fully managed RDBMS and fully compatible MySQL and PostgreSQL, including Aurora Serverless model. It includes many security features, network isolation using VPC and encryption using Key Management Service (KMS) and SSL.

When to use: best for MySQL and PostgreSQL compatible workloads that require automatic scaling

Amazon RDS: supports plenty of open-source and commercial database engine and it is a high available AWS service with Multi-AZ instances to replicate data across Availability Zones.

When to use: best when multiple RDBMS engines and simple management are required

Amazon Redshift: suitable for data warehouse type workloads, it has a deep integration with other AWS services such as S3, Kinesis Data Firehose, KMS and Cloudwatch. It is highly scalable and cost effective as it provides a number of node types and flexible pricing options.

When to use: best for working with large datasets and tight integration of other AWS services

When to use: best for working with large datasets and tight integration of other AWS services

NoSQL Databases

The term NoSQL started to appear around 2009 to describe non-relational databases that did not expose a SQL interface.

NoSQL databases don’t use the relational model instead data is stored as key-value pairs or JSON documents. This simplified design allows for ease of scalability and maximum flexibility.

The structure of the data doesn’t need to be defined before loading it into a NoSQL database. This makes an ideal choice for rapid development and changing requirements.

NoSQL databases are able to easily scale to meet application demands. Most work on a concept of “eventual consistency” where changes are eventually propagated to all nodes.

NoSQL databases are mostly used for:

Data Flexibility: storing data as key-value pairs or JSON documents allows the fields to change without costly schema changes.
Performance Scaling: the simple design of NoSQL databases allows for easy scalability based on application demands.

This type of database is not a good idea when requirements are:

Rigid Requirements: NoSQL databases are not a good choice for applications with a rigid data requirements.
Highly Connected Data: they are not a good choice for highly connected data sets. Highly normalised data is a better fit for relational databases.

The AWS services available as of today for NoSQL databases are:

Amazon DynamoDB: data can be stored as key-value pairs or documents. It supports trillions of requests per day and peaks of more than 20 million requests per second. It supports ACID compliant transactions.

When to use: best for projects where maximum flexibility is required

Amazon DocumentDB: fully managed MongoDB compatible database service.

When to use: best option if MongoDB compatibility is required.

Amazon Keyspaces: fully managed Apache Cassandra compatible database service.

When to use: best if the CQL (Cassandra query language) or other Cassandra compatibility is required.

In-Memory Databases

In-Memory databases are quick and flexible databases that don’t write to non-volatile storage.

The main characteristics are:

Performance: very fast because they don’t write to non volatile storage
Flexibility: typically used for caching in front of other database, they can also be used for storing session state information and passing messages between nodes.
Scalability: easily scalable

In-Memory databases are mostly used in cases like:

Caching: as caching layer in front of traditional RDBMS.
Messaging: useful for messaging services to share messages between systems.

In-Memory databases are not a good choice when the requirements are:

Persistent Data: they are not a good fit for applications that require highly durable data as they do not store data on non-volatile storage.
Relational Data: data are stored as key-value pairs so data with complex data relationship are not a good fit.

The AWS services available as of today for In-Memory databases are:

Amazon Elasticache for Memcached: managed memcached compatible in memory key-value store.

Amazon Elasticache for Redis: managed Redis compatible in memory store.

Carmine Carella