CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-23-slides.pdf

Carnegie Mellon University

Database Systems

Distributed OLTP Databases

ADMINISTRIVIA

Project #4 is due Sunday April 20^th @ 11

$\longrightarrow$ Recitation: Friday, April 11^th in GHC 4303 from 3

- 4

HW6 is due Sunday, April 20, 2025 @ 11

Final Exam is on Monday, April 28, 2025, from 05

- 08

$\longrightarrow$ Early exam will not be offered. Do not make travel plans.

This course is recruiting TAs for the next semester $\longrightarrow$ Apply at: https://www.ugrad.cs.cmu.edu/ta/F25/

ADMINISTRIVIA

Class on Monday, April 21: Review Session

$\rightarrow$ Come to class prepared with your questions. What material do you want me to go over again?

Class on Wednesday, April 23: Guest Lecture

$\rightarrow$ Real- world applications of Gen AI and Databases $\rightarrow$ Speaker: Sailesh Krishnamurthy, Google

UPCOMING DATABASE TALKS

MariaDB (DB Seminar)

$\rightarrow$ Monday, April 14 @ 4

$\rightarrow$ MariaDB's New Query Optimizer $\rightarrow$ Speaker: Michael Widenius $\rightarrow$ https://cmu.zoom.us/j/93441451665

Gel (DB Seminar) $\rightarrow$ Monday, April 21 @ 4

$\rightarrow$ EdgeQL with Gel $\rightarrow$ Speaker: Michael Sullivan $\rightarrow$ https://cmu.zoom.us/j/93441451665

LAST CLASS

System Architectures $\rightarrow$ Shared- Everything, Shared- Disk, Shared- Nothing

Partitioning/Sharding $\rightarrow$ Hash, Range, Round Robin

Transaction Coordination $\rightarrow$ Centralized vs. Decentralized

OLTP VS. OLAP

On-line Transaction Processing (OLTP):

$\longrightarrow$ Short- lived read/write txns. $\longrightarrow$ Small footprint. $\longrightarrow$ Repetitive operations.

On-line Analytical Processing (OLAP):

$\longrightarrow$ Long- running, read- only queries. $\longrightarrow$ Complex joins. $\longrightarrow$ Exploratory queries.

DECENTRALIZED COORDINATOR

Partitions

DECENTRALIZED COORDINATOR

Partitions

DECENTRALIZED COORDINATOR

OBSERVATION

Recall that our goal is to have multiple physical nodes appear as a single logical DBMS.

We have not discussed how to ensure that all nodes agree to commit a txn and then to make sure it does commit if the DBMS decides it should. $\rightarrow$ What happens if a node fails? $\rightarrow$ What happens if messages show up late? $\rightarrow$ What happens if the system does not wait for every node to agree to commit?

IMPORTANT ASSUMPTION

We will assume that all nodes in a distributed DBMS are well- behaved and under the same administrative domain. $\rightarrow$ If we tell a node to commit a txn, then it will commit the txn (if there is not a failure).If you do not trust the other nodes in a distributed DBMS, then you need to use a Byzantine Fault Tolerant protocol for txns (blockchain). $\rightarrow$ Blockchains are not good for high- throughput workloads.

IMPORTANT ASSUMPTION

We will assume that all nodes in a distributed DBMS are well- behaved and under the same administrative domain.

$\rightarrow$ If we tell a node to commit a txn, then it will commit the txn (if there is not a failure).

If you do not trust the other nodes in a distributed DBMS, then you need to use a Byzantine Fault Tolerant protocol for txns (blockchain). $\rightarrow$ Blockchains are not good for high- throughput workloads.

TODAY'S AGENDA

ReplicationAtomic Commit ProtocolsConsistency Issues (CAP / PACELC)

REPLICATION

The DBMS can replicate a database across redundant nodes to increase availability. $\longrightarrow$ Partitioned vs. Non- Partitioned $\longrightarrow$ Shared- Nothing vs. Shared- Disk

Design Decisions:

$\longrightarrow$ Replica Configuration $\longrightarrow$ Propagation Scheme $\longrightarrow$ Propagation Timing $\longrightarrow$ Update Method

REPLICA CONFIGURATIONS

Approach #1: Primary-Replica

$\longrightarrow$ All updates go to a designated primary for each object. $\longrightarrow$ The primary propagates updates to its replicas by shipping logs. $\longrightarrow$ Read- only txns may be allowed to access replicas. $\longrightarrow$ If the primary goes down, then hold an election to select a new primary.

Approach #2: Multi-Primary

$\longrightarrow$ Txns can update data objects at any replica. $\longrightarrow$ Replicas must synchronize with each other using an atomic commit protocol.

REPLICA CONFIGURATIONS

Primary- Replica

REPLICA CONFIGURATIONS

Primary-Replica

REPLICA CONFIGURATIONS

Primary- Replica

REPLICA CONFIGURATIONS

Primary-Replica

REPLICA CONFIGURATIONS

Primary-Replica

Multi-Primary

REPLICA CONFIGURATIONS

Primary-Replica

Multi-Primary

REPLICA CONFIGURATIONS

Primary-Replica

Multi-Primary

K-SAFETY

K- safety is a threshold for determining the fault tolerance of the replicated database.

The value $K$ represents the number of replicas per data object that must always be available.

If the number of replicas goes below this threshold, then the DBMS halts execution and takes itself offline.

PROPAGATION SCHEME

When a txn commits on a replicated database, the DBMS decides whether it must wait for that txn's changes to propagate to other nodes before it can send the acknowledgement to application.

Propagation levels: $\rightarrow$ Synchronous (Strong Consistency) $\rightarrow$ Asynchronous (Eventual Consistency)

PROPAGATION SCHEME

Approach #1: Synchronous