GithubDocuments
    1. Marketplace
    2. CMU Database - 15445 - 2025 Spring
    Collections
    CMU Database - 15445 - 2025 Spring
    cmu_database
    This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...
    DocumentsKnowledge Graph
    lecture-22-slides.pdf
    Carnegie Mellon University
    Database Systems
    Distributed
    Databases

    ADMINISTRIVIA

    Project #4 is due Sunday April 20th @ 11
    $\longrightarrow$ Recitation: Friday, April 11th in GHC 4303 from 3
    - 4
    PM
    HW6 is due Sunday, April 20, 2025 @ 11
    Final Exam is on Monday, April 28, 2025, from 05
    - 08
    $\longrightarrow$ Early exam will not be offered. Do not make travel plans.
    This course is recruiting TAs for the next semester $\longrightarrow$ Apply at: https://www.ugrad.cs.cmu.edu/ta/F25/

    UPCOMING DATABASE TALKS

    MariaDB (DB Seminar)

    $\rightarrow$ Monday, April 14 @ 4
    $\rightarrow$ MariaDB's New Query Optimizer $\rightarrow$ Speaker: Michael Widenius $\rightarrow$ https://cmu.zoom.us/j/93441451665
    Gel (DB Seminar) $\rightarrow$ Monday, April 21 @ 4
    $\rightarrow$ EdgeQL with Gel $\rightarrow$ Speaker: Michael Sullivan $\rightarrow$ https://cmu.zoom.us/j/93441451665

    COURSE STATUS

    Databases are hard.
    Distributed databases are harder.

    Query Planning

    Concurrency Control
    Operator Execution
    Access Methods
    Recovery
    Buffer Pool Manager
    Disk Manager

    COURSE STATUS

    Databases are hard.
    Distributed databases are harder.

    COURSE STATUS

    Databases are hard.
    Distributed databases are harder.

    PARALLEL VS. DISTRIBUTED

    Parallel DBMSs:

    $\rightarrow$ Nodes are physically close to each other. $\rightarrow$ Nodes connected with high- speed LAN. $\rightarrow$ Communication cost is assumed to be small.

    Distributed DBMSs:

    $\rightarrow$ Nodes can be far from each other. $\rightarrow$ Nodes connected using public network. $\rightarrow$ Communication cost and problems cannot be ignored.

    DISTRIBUTED DBMSs

    Use the building blocks that we covered in single- node DBMSs to now support transaction processing and query execution in distributed environments.
    $\longrightarrow$ Optimization & Planning $\longrightarrow$ Concurrency Control $\longrightarrow$ Logging & Recovery

    TODAY'S AGENDA

    System ArchitecturesDesign IssuesPartitioning SchemesDistributed Concurrency ControlDB Flash Talk: dbt Labs

    SYSTEM ARCHITECTURE

    A distributed DBMS's system architecture specifies what shared resources are directly accessible to CPUs.
    This affects how CPUs coordinate with each other and where they retrieve/store objects in the database.

    SYSTEM ARCHITECTURE

    SHARED NOTHING

    Each DBMS node has its own CPU, memory, and local disk.
    Nodes only communicate with each other via network.
    $\longrightarrow$ Better performance & efficiency. $\longrightarrow$ Harder to scale capacity. $\longrightarrow$ Harder to ensure consistency.

    SHARED NOTHING

    Each DBMS node has its own CPU, memory, and local disk.
    Nodes only communicate with each other via network.
    $\longrightarrow$ Better performance & efficiency. $\longrightarrow$ Harder to scale capacity. $\longrightarrow$ Harder to ensure consistency.

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED NOTHING EXAMPLE

    SHARED DISK

    Nodes access a single logical disk via an interconnect, but each have their own private memories.
    $\longrightarrow$ Scale execution layer independently from the storage layer. $\longrightarrow$ Nodes can still use direct attached storage as a slower/larger cache. $\longrightarrow$ This architecture facilitates data lakes and serverless systems.

    SHARED DISK

    Nodes access a single logical disk via an interconnect, but each have their own private memories.
    $\longrightarrow$ Scale execution layer independently from the storage layer. $\longrightarrow$ Nodes can still use direct attached storage as a slower/larger cache. $\longrightarrow$ This architecture facilitates data lakes and serverless systems.

    SHARED DISK

    Nodes access a single logical disk via an interconnect, but each have their own private memories.
    $\longrightarrow$ Scale execution layer independently from the storage layer. $\longrightarrow$ Nodes can still use direct attached storage as a slower/larger cache. $\longrightarrow$ This architecture facilitates data lakes and serverless systems.

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    Application Server
    Node

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED DISK EXAMPLE

    SHARED MEMORY

    Nodes access a common memory address space via a fast interconnect.
    $\longrightarrow$ Each node has a global view of all the in- memory data structures. $\longrightarrow$ Can still use local memory / disk for intermediate results.
    This looks a lot like shared- everything. Nobody does this.

    DESIGN ISSUES

    How does the application find data? Where does the application send queries? How to execute queries on distributed data? $\longrightarrow$ Push query to data. $\longrightarrow$ Pull data to query. How do we divide the database across resources? How does the DBMS ensure correctness?

    DESIGN ISSUES

    How does the application find data?
    Where does the application send queries?
    How to execute queries on distributed data?
    $\rightarrow$ Push query to data. $\rightarrow$ Pull data to query.
    How do we divide the database across resources?
    How does the DBMS ensure correctness?
    Next Class

    DATA TRANSPARENCY

    Applications should not be required to know where data is physically located in a distributed DBMS. $\longrightarrow$ Any query that run on a single- node DBMS should produce the same result on a distributed DBMS.
    In practice, developers need to be aware of the communication costs of queries to avoid excessively "expensive" data movement.
    Table (html):
    Chip -> ChipUPIIntra-rackPlanetary
    ~100 ns~100 μs~100 ms

    DATA TRANSPARENCY

    Applications should not be required to know where data is physically located in a distributed DBMS. $\longrightarrow$ Any query that run on a single- node DBMS should produce the same result on a distributed DBMS.
    In practice, developers need to be aware of the communication costs of queries to avoid excessively "expensive" data movement.

    DATABASE PARTITIONING

    Split database across multiple resources: $\rightarrow$ Disks, nodes, processors. $\rightarrow$ Called "sharding" in NoSQL systems.
    The DBMS executes query fragments on each partition and then combines the results to produce a single answer.

    NAIVE TABLE PARTITIONING

    Assign an entire table to a single node.
    Assumes that each node has enough storage space for an entire table.
    Ideal if queries never join data across tables stored on different nodes and access patterns are uniform.

    NAIVE TABLE PARTITIONING

    Ideal Query:

    SELECT * FROM table1

    NAIVE TABLE PARTITIONING

    Ideal Query:

    SELECT * FROM table1

    NAIVE TABLE PARTITIONING

    NAIVE TABLE PARTITIONING

    Ideal Query:

    SELECT * FROM table1

    VERTICAL PARTITIONING

    Split a table's attributes into separate partitions.
    Must store tuple information to reconstruct the original record.
    CREATE TABLE foo ( attr1 INT, attr2 INT, attr3 INT, attr4 TEXT);
    Table (html):
    Tuple#1attr1attr2attr3attr4
    Tuple#2attr1attr2attr3attr4
    Tuple#3attr1attr2attr3attr4
    Tuple#4attr1attr2attr3attr4

    VERTICAL PARTITIONING

    Split a table's attributes into separate partitions.
    Must store tuple information to reconstruct the original record.
    CREATE TABLE foo ( attr1 INT, attr2 INT, attr3 INT, attr4 TEXT);
    Table (html):
    Tuple#1attr1attr2attr3attr4
    Tuple#2attr1attr2attr3attr4
    Tuple#3attr1attr2attr3attr4
    Tuple#4attr1attr2attr3attr4

    VERTICAL PARTITIONING

    Split a table's attributes into separate partitions.
    Must store tuple information to reconstruct the original record.
    CREATE TABLE foo ( attr1 INT, attr2 INT, attr3 INT, attr4 TEXT);
    Table (html):
    Tuple#1attr1attr2attr3
    Tuple#2attr1attr2attr3
    Tuple#3attr1attr2attr3
    Tuple#4attr1attr2attr3
    [TableCaption: Partition #1]
    Table (html):
    Tuple#1attr4
    Tuple#2attr4
    Tuple#3attr4
    Tuple#4attr4
    [TableCaption: Partition #2]

    HORIZONTAL PARTITIONING

    Split a table's tuples into disjoint subsets based on some partitioning key and scheme. $\longrightarrow$ Choose column(s) that divides the database equally in terms of size, load, or usage.
    Partitioning Schemes:
    $\longrightarrow$ Hashing $\longrightarrow$ Ranges $\longrightarrow$ Predicates

    HORIZONTAL PARTITIONING

    Table
    Table (html):
    101aXXX2022-11-29
    102bXXY2022-11-28
    103cXYZ2022-11-29
    104dXYX2022-11-27
    105eXYY2022-11-29
    Ideal Query:
    SELECT * FROM table WHERE partitionKey = ?

    HORIZONTAL PARTITIONING

    HORIZONTAL PARTITIONING

    Partitioning Key

    Table

    Ideal Query:

    SELECT * FROM table WHERE partitionKey = ?

    HORIZONTAL PARTITIONING

    Partitioning Key

    Table

    Ideal Query:

    SELECT * FROM table WHERE partitionKey = ?
    Partitions

    HORIZONTAL PARTITIONING

    Partitioning Key

    Table

    Ideal Query:

    SELECT * FROM table WHERE partitionKey = ?

    Partitions

    HORIZONTAL PARTITIONING

    Partitioning Key

    Table

    Ideal Query:

    SELECT * FROM table WHERE partitionKey = ?

    Partitions

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-DISK PARTITIONING

    SHARED-NOTHING PARTITIONING

    SHARED-NOTHING PARTITIONING

    SHARED-NOTHING PARTITIONING

    SHARED-NOTHING PARTITIONING

    HORIZONTAL PARTITIONING

    Partitioning Key

    Table

    Ideal Query:

    SELECT * FROM table WHERE partitionKey = ?

    Partitions

    HORIZONTAL PARTITIONING

    HORIZONTAL PARTITIONING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    CONSISTENT HASHING

    SINGLE-NODE VS. DISTRIBUTED

    A single- node txn only accesses data that is contained on one partition. $\rightarrow$ The DBMS may not need check the behavior concurrent txns running on other nodes.
    A distributed txn accesses data at one or more partitions.
    $\rightarrow$ Requires expensive coordination.

    TRANSACTION COORDINATION

    If our DBMS supports multi- operation and distributed txns, we need a way to coordinate their execution in the system.
    Two different approaches: $\rightarrow$ Centralized: Global "traffic cop". $\rightarrow$ Decentralized: Nodes organize themselves.
    Most distributed DBMSs use a hybrid approach where they periodically elect some node to be a temporary coordinator.

    CENTRALIZED COORDINATOR

    Coordinator

    Partitions

    CENTRALIZED COORDINATOR

    Coordinator

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    Middleware

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    CENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    Partitions

    DECENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    DECENTRALIZED COORDINATOR

    OBSERVATION

    We have assumed that the nodes in our distributed systems are running the same DBMS software. But organizations often run many different DBMSs in their applications.
    It would be nice if we could have a single interface for all our data.

    FEDERATED DATABASES

    Distributed architecture that connects disparate DBMSs into a single logical system.
    $\rightarrow$ Expose a single query interface that can access data at any location.
    This is hard and nobody does it well $\rightarrow$ Different data models, query languages, limitations. $\rightarrow$ No easy way to optimize queries $\rightarrow$ Lots of data copying (bad).

    FEDERATED DATABASE EXAMPLE

    Middleware

    FEDERATED DATABASE EXAMPLE

    FEDERATED DATABASE EXAMPLE

    DISTRIBUTED CONCURRENCY CONTROL

    Need to allow multiple txns to execute simultaneously across multiple nodes. $\rightarrow$ Many of the same protocols from single- node DBMSs can be adapted.
    This is harder because of:
    $\rightarrow$ Replication. $\rightarrow$ Network Communication Overhead. $\rightarrow$ Node Failures (Permanent + Ephemeral). $\rightarrow$ Clock Skew.

    DISTRIBUTED 2PL

    DISTRIBUTED 2PL

    DISTRIBUTED 2PL

    DISTRIBUTED 2PL

    DISTRIBUTED 2PL

    DISTRIBUTED 2PL

    DISTRIBUTED 2PL

    CONCLUSION

    We have barely scratched the surface on distributed database systems...
    It is hard to get this right.

    NEXT CLASS

    Distributed OLTP Systems Replication CAP Theorem Real- World Examples