This is a continuation of a series of articles in which I briefly cover the main points of a specific topic in system architecture design. The previous article can be read here, and the complete guide you can find on my github.

This article will examine the basic concepts, approaches and patterns of schema design and scaling relational databases. In the second part, we will dwell in more detail on data queries and their optimization.

Introduction

In system design, relational databases are a traditional choice for data storage, offering structured schema and powerful query capabilities, contrasted with NoSQL databases' flexible schemes and scalability.

The following concepts are distinguished:

Terminology

Database Design Strategies

Top-Down

It begins with an analysis of the general requirements of the system at a conceptual level before designing the detailed structure of the database.

Pros

Cons

Bottom-Up

It starts with designing the detailed data structures, such as specific tables and relationships, before aggregating them into a complete system.

Pros

Cons

ER (Entity Relationship) Diagram

An Entity-Relationship (ER) diagram is a graphical representation of entities and their relationships to each other. An ER diagram helps structure and organize data requirements before developing a database.

Components

Types

There are several ERD notations: Chen, Crow's Foot(Information Engineering), Barker's, IEC, Martin, etc. Each notation has its advantages and is chosen based on the preferences of the database designers, the complexity of the system being modeled, and the project's specific requirements. Crow's Foot and Chen's notations remain among the most popular.

Chen notation

Crow's Foot / Information Engineering

Design Patterns

Design patterns for relational databases are reusable solutions to common problems encountered when designing database schemas, querying data, and implementing database-related functionalities. Here are some key design patterns for relational databases:

Normalization

Normalization involves applying a set of rules or "normal forms" to ensure the database structure is clear, efficient, and able to handle updates and queries reliably. It aims to reduce redundancy and improve data integrity by ensuring that each piece of data is stored only once.

Pros

Cons

Normal Forms

Normal forms are a series of guidelines or rules used in normalization. There are several normal forms, each building upon the principles of the previous one. While achieving high levels of normalization offers significant benefits, it comes with tradeoffs regarding performance and ease of use. Balancing these factors requires careful consideration based on the specific demands of the application.

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

Denormalization

Denormalization is adding redundant data to a normalized database to improve query performance, simplify the database structure, or address issues with read-heavy database applications requiring fast data access.

Pros

Cons

Entity-Attribute-Value (EAV)

EAV is a database design pattern used to represent entities. A unique identifier represents each entity, each attribute is represented as a record, and the attribute's value for the entity is stored in a separate table. This model allows for the flexible and dynamic addition of attributes without altering the database schema.

Pros

Cons

Master-Detail

The Master-Detail pattern involves two interconnected tables: a master table that holds primary information and one or more detail tables that contain related data linked back to the master table through foreign keys. This pattern is commonly used to manage related data entities where the master entity controls various operations of the detail entities, such as cascading updates or deletions.

Pros

Cons

Table Inheritance

Table Inheritance involves creating a table structure that mimics the inheritance of entities or objects in an application, typically using one of three approaches, each with varying strategies for how data is stored across parent and child tables:

Pros

Cons

Star

The Star Schema organizes data into a central fact table that contains quantitative metrics (facts) and foreign keys to related dimension tables, which store descriptive attributes related to the facts. This pattern resembles a star, with the fact table at the center and dimension tables radiating outwards.

Pros

Cons

Snowflake

The Snowflake Schema organizes data into a central fact table surrounded by normalized dimension tables. Unlike the Star Schema, where dimension tables are denormalized, in the Snowflake Schema, dimension data is broken down into additional tables to eliminate redundancy and enforce data integrity.

Pros

Cons

Audit logging

Audit Logging pattern in databases is a design approach focused on systematically recording changes to data or actions performed within an application or system. It involves creating detailed logs for all create, read, update, and delete (CRUD) operations on data, providing a transparent and immutable history of all transactions and changes within the database for security, compliance, and debugging purposes.

Pros

Cons

Versioning

Versioning involves adding metadata to each record in a database to indicate its version, with mechanisms to create new versions upon updates while preserving the old versions. This can be implemented through various means, such as additional versioned tables, timestamping, or a separate history table to store changes.

Pros

Cons

Scaling and Fault Tolerance

Scaling and fault tolerance are critical aspects of managing relational databases, especially in environments that require high availability, performance, and consistency. These concepts ensure that a database can handle growing amounts of work and recover from hardware or software failures without data loss.

Scaling refers to the database's ability to accommodate growth in data volume and transaction throughput without compromising performance. It can be achieved in two primary ways:

Fault tolerance refers to the ability of the database system to continue operating without interruption in the event of a hardware or software failure. Key strategies include:

There are several techniques for scaling and improving reliability.

Master-Slave (Primary-Secondary) replication

Master-slave replication is a widely used architecture for data replication and distribution. It involves a primary database server (the master) and one or more secondary database servers (the slaves). The master server handles all the write and/or read operations and logs changes, while the slave servers replicate these changes from the master, allowing them to handle read queries.

Benefits

Tradeoffs

Use Cases

Multi-Master (Master-Master, Primary-Primary ) replication

Multi-master replication is an architecture that allows data to be replicated across multiple servers, each capable of handling read and write operations. This setup creates a distributed database system where changes made on one server are automatically replicated to all other servers in the replication group, ensuring each server has the same data set.

Benefits

Tradeoffs

Use Cases

Sharding

Sharding is a technique that scales databases horizontally by partitioning data across multiple servers or instances. Each shard contains a subset of the total data; the shards comprise the entire database.

Benefits

Tradeoffs

Use Cases

Federation

Federation is a strategy for managing and accessing data distributed across multiple databases or different systems. It involves creating a virtual database that abstracts several physical databases, allowing users to interact with it as if it were a single database. This approach enables the integration of diverse data sources, providing unified access and manipulation of data without consolidating it physically in one location.

Benefits

Tradeoffs

Use Cases