High availability in Azure databases

Azure has multiple database services for different user scenarios. Each of them has a unique architecture and follows different storage and retrieval mechanism.

Let’s go through each of these options and see how high availability is achieved.

Azure SQL DB

Basic, Standard, and General Purpose Premium and Business Critical service tier availability Hyperscale service tier availability
The stateless compute and stateful storage layers are present separately.

The compute layer has the sqlserver.exe

The data/storage layer has the actual data (.mdf/.ldf files) stored in Azure BLOB storage.

Since the data is present in Azure BLOB storage, it follows the same redundancy and availability policy of BLOB storage (i.e. all data in storage account is always replicated at least 3 times).

Compute and storage layers are present together in the same node.

There are 3 to 4 nodes present for achieving high availability.

One of them called primary replica is available for read write. The rest of the nodes called secondary replica contains the copy of the primary.

The secondary nodes can also be created in different availability zones by making the database zone redundant.

This tier has 4 layers and each with redundancy set.

The stateless compute layer has the sqlserver.exe, including 1 primary replicate and 0-4 secondary replicas

The stateless storage layer contains the page servers with transient and cached data. Each page server has a paired page server in active-active configuration.

The stateful transaction log and data storage layer has the logs and the actual data. They are stored in Azure storage which in turn comes with default redundancy options

When compute layer goes down, Azure Service Fabric will spin up another compute layer and attaches the data to it. Since the compute layer does not have any data, the recovery happens in few seconds.

When 1 of the copy of data layer goes down, Azure will create another copy of the data to replace the lost one. So, at any time we will always have 3 copies of the data.

When the primary node fails, Azure Service Fabric will make the available secondary node to be the primary. Since there is no movement of data, the switch over happens instantly.

Once the new primary is selected, a new secondary is created to replace the lost secondary. This way we will always have enough copies.

As each layer has redundancy set, the failover process is seamless

Azure Cosmos DB

The data is broken down into logical partitions using the partition key.

Each physical partition contains 1 or more logical partitions. The number of distributions of physical partition is transparent to the users and cannot be changed.

Each physical partition has 3 more copies of the data in the same region. This is called the replica set. The writes to the physical partition are committed only when it is written to most replicas in the replica set.

These physical partitions are also replicated across all the Azure regions we choose for the COSMOS DB. This is called the partition set.

So, if the cosmos DB is created with n regions, there will be n*4 copies of the data.

In addition to regional resiliency, zone redundancy can also be enabled for increase the availability.

The combination of replica set, partition set, zone redundancy and regional resiliency ensures that the data is available even if 1 or few of the regions fails.

Scenarios Read region outage Write region outage
Single write The database remains available for both reads and writes

The impacted read region is disconnected from the write region

No change in application required

The database remains available for reads.

A new region is promoted as write region.

When the impacted region is back online, the missed changes are made available in conflict feed.

Once the impacted region is recovered, it is available as read region.

Multi-write No impact and no change in application No impact and no change in application

Azure database for MySQL and MariaDB

Just like Azure SQL DB Basic, Standard, and General-Purpose tier, the instance and storage are placed separately.

When the instance fails, Azure Database Management Service immediately spins up another service and attaches the storage to this new service. Since this step does not involve data movement, the switch over is instant.

The actual data is stored in the Azure storage which comes with at least 3 copies for redundancy. When one of the copies is lost, Azure will create another copy of the data to replace the lost one.

Azure database for PostgreSQL

Single Server Hyperscale (Citus)
Just like Azure SQL DB Basic, Standard, and General-Purpose tier, the instance and storage are separately placed. Each node in the server groups has a standby replica in different availability zone which aids in high availability
When the instance fails, Azure Database Management Service immediately spins up another service and attaches the storage to this new service. Since this step does not involve data movement, the switch over is instant.

The actual data is stored in the Azure storage which comes with at least 3 copies for redundancy. When one of the copies is lost, Azure will create another copy of the data to replace the lost one.

There are 3 stages during HA and recovery:

Stage 1(Detection) – Period health check on each node is run. If the check fails after 4 checks, the node is marked as down

Stage 2(Fail over) – A standby node is promoted as the new primary and a new standby is created

Stage 3(Full recovery) – Replication completes for the new standby

References