CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-22-slides.pdf

Carnegie Mellon University

Database Systems

Distributed

Databases

ADMINISTRIVIA

Project #4 is due Sunday April 20^th @ 11

$\longrightarrow$ Recitation: Friday, April 11^th in GHC 4303 from 3

- 4

HW6 is due Sunday, April 20, 2025 @ 11

Final Exam is on Monday, April 28, 2025, from 05

- 08

$\longrightarrow$ Early exam will not be offered. Do not make travel plans.

This course is recruiting TAs for the next semester $\longrightarrow$ Apply at: https://www.ugrad.cs.cmu.edu/ta/F25/

UPCOMING DATABASE TALKS

MariaDB (DB Seminar)

$\rightarrow$ Monday, April 14 @ 4

$\rightarrow$ MariaDB's New Query Optimizer $\rightarrow$ Speaker: Michael Widenius $\rightarrow$ https://cmu.zoom.us/j/93441451665

Gel (DB Seminar) $\rightarrow$ Monday, April 21 @ 4

$\rightarrow$ EdgeQL with Gel $\rightarrow$ Speaker: Michael Sullivan $\rightarrow$ https://cmu.zoom.us/j/93441451665

COURSE STATUS

Databases are hard.

Distributed databases are harder.

Query Planning

Concurrency Control

Operator Execution

Access Methods

Recovery

Buffer Pool Manager

Disk Manager

COURSE STATUS

Databases are hard.

Distributed databases are harder.

COURSE STATUS

Databases are hard.

Distributed databases are harder.

PARALLEL VS. DISTRIBUTED

Parallel DBMSs:

$\rightarrow$ Nodes are physically close to each other. $\rightarrow$ Nodes connected with high- speed LAN. $\rightarrow$ Communication cost is assumed to be small.

Distributed DBMSs:

$\rightarrow$ Nodes can be far from each other. $\rightarrow$ Nodes connected using public network. $\rightarrow$ Communication cost and problems cannot be ignored.

DISTRIBUTED DBMSs

Use the building blocks that we covered in single- node DBMSs to now support transaction processing and query execution in distributed environments.

$\longrightarrow$ Optimization & Planning $\longrightarrow$ Concurrency Control $\longrightarrow$ Logging & Recovery

TODAY'S AGENDA

System ArchitecturesDesign IssuesPartitioning SchemesDistributed Concurrency ControlDB Flash Talk: dbt Labs

SYSTEM ARCHITECTURE

A distributed DBMS's system architecture specifies what shared resources are directly accessible to CPUs.

This affects how CPUs coordinate with each other and where they retrieve/store objects in the database.

SYSTEM ARCHITECTURE

SHARED NOTHING

Each DBMS node has its own CPU, memory, and local disk.

Nodes only communicate with each other via network.

$\longrightarrow$ Better performance & efficiency. $\longrightarrow$ Harder to scale capacity. $\longrightarrow$ Harder to ensure consistency.

SHARED NOTHING

Each DBMS node has its own CPU, memory, and local disk.

Nodes only communicate with each other via network.

$\longrightarrow$ Better performance & efficiency. $\longrightarrow$ Harder to scale capacity. $\longrightarrow$ Harder to ensure consistency.

SHARED NOTHING EXAMPLE

SHARED DISK

Nodes access a single logical disk via an interconnect, but each have their own private memories.

$\longrightarrow$ Scale execution layer independently from the storage layer. $\longrightarrow$ Nodes can still use direct attached storage as a slower/larger cache. $\longrightarrow$ This architecture facilitates data lakes and serverless systems.

SHARED DISK

Nodes access a single logical disk via an interconnect, but each have their own private memories.

SHARED DISK

Nodes access a single logical disk via an interconnect, but each have their own private memories.

SHARED DISK EXAMPLE

Application Server

Node

SHARED DISK EXAMPLE

SHARED MEMORY

Nodes access a common memory address space via a fast interconnect.

$\longrightarrow$ Each node has a global view of all the in- memory data structures. $\longrightarrow$ Can still use local memory / disk for intermediate results.

This looks a lot like shared- everything. Nobody does this.

DESIGN ISSUES

How does the application find data? Where does the application send queries? How to execute queries on distributed data? $\longrightarrow$ Push query to data. $\longrightarrow$ Pull data to query. How do we divide the database across resources? How does the DBMS ensure correctness?

DESIGN ISSUES

How does the application find data?

Where does the application send queries?

How to execute queries on distributed data?

$\rightarrow$ Push query to data. $\rightarrow$ Pull data to query.

How do we divide the database across resources?

How does the DBMS ensure correctness?

Next Class

DATA TRANSPARENCY

Applications should not be required to know where data is physically located in a distributed DBMS. $\longrightarrow$ Any query that run on a single- node DBMS should produce the same result on a distributed DBMS.

In practice, developers need to be aware of the communication costs of queries to avoid excessively "expensive" data movement.

Table (html):

Chip -> Chip	UPI	Intra-rack	Planetary
~100 ns	~100 μs	~100 ms

DATA TRANSPARENCY

In practice, developers need to be aware of the communication costs of queries to avoid excessively "expensive" data movement.

DATABASE PARTITIONING

Split database across multiple resources: $\rightarrow$ Disks, nodes, processors. $\rightarrow$ Called "sharding" in NoSQL systems.

The DBMS executes query fragments on each partition and then combines the results to produce a single answer.

NAIVE TABLE PARTITIONING

Assign an entire table to a single node.

Assumes that each node has enough storage space for an entire table.

Ideal if queries never join data across tables stored on different nodes and access patterns are uniform.

NAIVE TABLE PARTITIONING

Ideal Query:

SELECT * FROM table1

NAIVE TABLE PARTITIONING

Ideal Query:

SELECT * FROM table1

NAIVE TABLE PARTITIONING

Ideal Query:

SELECT * FROM table1

VERTICAL PARTITIONING

Split a table's attributes into separate partitions.

Must store tuple information to reconstruct the original record.

CREATE TABLE foo ( attr1 INT, attr2 INT, attr3 INT, attr4 TEXT);

Table (html):

Tuple#1	attr1	attr2	attr3	attr4
Tuple#2	attr1	attr2	attr3	attr4
Tuple#3	attr1	attr2	attr3	attr4
Tuple#4	attr1	attr2	attr3	attr4

VERTICAL PARTITIONING

Split a table's attributes into separate partitions.

Must store tuple information to reconstruct the original record.

CREATE TABLE foo ( attr1 INT, attr2 INT, attr3 INT, attr4 TEXT);

Table (html):

Tuple#1	attr1	attr2	attr3	attr4
Tuple#2	attr1	attr2	attr3	attr4
Tuple#3	attr1	attr2	attr3	attr4
Tuple#4	attr1	attr2	attr3	attr4

VERTICAL PARTITIONING

Split a table's attributes into separate partitions.

Must store tuple information to reconstruct the original record.

CREATE TABLE foo ( attr1 INT, attr2 INT, attr3 INT, attr4 TEXT);

Table (html):

Tuple#1	attr1	attr2	attr3
Tuple#2	attr1	attr2	attr3
Tuple#3	attr1	attr2	attr3
Tuple#4	attr1	attr2	attr3

[TableCaption: Partition #1]

Table (html):

Tuple#1	attr4
Tuple#2	attr4
Tuple#3	attr4
Tuple#4	attr4

[TableCaption: Partition #2]

HORIZONTAL PARTITIONING

Split a table's tuples into disjoint subsets based on some partitioning key and scheme. $\longrightarrow$ Choose column(s) that divides the database equally in terms of size, load, or usage.

Partitioning Schemes:

$\longrightarrow$ Hashing $\longrightarrow$ Ranges $\longrightarrow$ Predicates

HORIZONTAL PARTITIONING

Table

Table (html):

101	a	XXX	2022-11-29
102	b	XXY	2022-11-28
103	c	XYZ	2022-11-29
104	d	XYX	2022-11-27
105	e	XYY	2022-11-29

Ideal Query:

SELECT * FROM table WHERE partitionKey = ?

HORIZONTAL PARTITIONING

Partitioning Key

Table

Ideal Query:

SELECT * FROM table WHERE partitionKey = ?

HORIZONTAL PARTITIONING

Partitioning Key

Table

Ideal Query:

SELECT * FROM table WHERE partitionKey = ?

Partitions

HORIZONTAL PARTITIONING

Partitioning Key

Table

Ideal Query:

SELECT * FROM table WHERE partitionKey = ?

Partitions

HORIZONTAL PARTITIONING

Partitioning Key

Table

Ideal Query:

SELECT * FROM table WHERE partitionKey = ?

Partitions

SHARED-DISK PARTITIONING

SHARED-NOTHING PARTITIONING

HORIZONTAL PARTITIONING

Partitioning Key

Table

Ideal Query:

SELECT * FROM table WHERE partitionKey = ?

Partitions

HORIZONTAL PARTITIONING

CONSISTENT HASHING

SINGLE-NODE VS. DISTRIBUTED

A single- node txn only accesses data that is contained on one partition. $\rightarrow$ The DBMS may not need check the behavior concurrent txns running on other nodes.

A distributed txn accesses data at one or more partitions.

$\rightarrow$ Requires expensive coordination.

TRANSACTION COORDINATION

If our DBMS supports multi- operation and distributed txns, we need a way to coordinate their execution in the system.

Two different approaches: $\rightarrow$ Centralized: Global "traffic cop". $\rightarrow$ Decentralized: Nodes organize themselves.

Most distributed DBMSs use a hybrid approach where they periodically elect some node to be a temporary coordinator.

CENTRALIZED COORDINATOR

Coordinator

Partitions

CENTRALIZED COORDINATOR

Coordinator

CENTRALIZED COORDINATOR

Middleware

CENTRALIZED COORDINATOR

DECENTRALIZED COORDINATOR

Partitions

DECENTRALIZED COORDINATOR

OBSERVATION

We have assumed that the nodes in our distributed systems are running the same DBMS software. But organizations often run many different DBMSs in their applications.

It would be nice if we could have a single interface for all our data.

FEDERATED DATABASES

Distributed architecture that connects disparate DBMSs into a single logical system.

$\rightarrow$ Expose a single query interface that can access data at any location.

This is hard and nobody does it well $\rightarrow$ Different data models, query languages, limitations. $\rightarrow$ No easy way to optimize queries $\rightarrow$ Lots of data copying (bad).

FEDERATED DATABASE EXAMPLE

Middleware

FEDERATED DATABASE EXAMPLE

DISTRIBUTED CONCURRENCY CONTROL

Need to allow multiple txns to execute simultaneously across multiple nodes. $\rightarrow$ Many of the same protocols from single- node DBMSs can be adapted.

This is harder because of:

$\rightarrow$ Replication. $\rightarrow$ Network Communication Overhead. $\rightarrow$ Node Failures (Permanent + Ephemeral). $\rightarrow$ Clock Skew.

DISTRIBUTED 2PL

CONCLUSION

We have barely scratched the surface on distributed database systems...

It is hard to get this right.

NEXT CLASS

Distributed OLTP Systems Replication CAP Theorem Real- World Examples