GithubDocuments
    1. Marketplace
    2. CMU Database - 15445 - 2025 Spring
    Collections
    CMU Database - 15445 - 2025 Spring
    cmu_database
    This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...
    DocumentsKnowledge Graph
    lecture-14-slides.pdf
    Carnegie Mellon University
    Database Systems Query Execution II

    ADMINISTRIVIA

    Project #3 is due Sunday March 30th @ 11
    $\rightarrow$ Recitation on Fri, Mar 14 from 5
    - 6
    pm in GHC 5117.
    Mid- term exam grades now posted.
    $\rightarrow$ Exam viewing for the next 3 OH, including today $\rightarrow$ The last OH for exam viewing is on March 24. $\rightarrow$ Special OH this Thu 9
    am - 10
    am, 9103 GHC. $\rightarrow$ Stats: Mean: 75.1, Std- dev: 10.4.
    $\rightarrow$ Notes: Used partial grading, and full points given on the join question.

    LAST CLASS

    We discussed composing operators into a plan to execute a query.
    We assumed that queries execute with a single worker (e.g., a thread).
    We will now discuss how to execute queries in parallel using multiple workers.
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    PARALLEL QUERY EXECUTION

    The database is spread across multiple resources to $\rightarrow$ Deal with large data sets that don't fit on a single machine/node $\rightarrow$ Higher performance $\rightarrow$ Redundancy/Fault- tolerance
    Appears as a single logical database instance to the application, regardless of physical organization. $\rightarrow$ SQL query for a single- resource DBMS should generate the same result on a parallel or distributed DBMS.

    PARALLEL VS. DISTRIBUTED

    Parallel DBMSs

    $\rightarrow$ Resources are physically close to each other. $\rightarrow$ Resources communicate over high- speed interconnect. $\rightarrow$ Communication is assumed to be cheap and reliable.

    Distributed DBMSs

    $\rightarrow$ Resources can be far from each other. $\rightarrow$ Resources communicate using slow(er) interconnect. $\rightarrow$ Communication costs and problems cannot be ignored.

    TODAY'S AGENDA

    Process ModelsExecution ParallelismI/O ParallelismDB Flash Talk: Confluent

    PROCESS MODEL

    A DBMS's process model defines how the system is architected to support concurrent requests / queries.
    A worker is the DBMS component responsible for executing tasks on behalf of the client and returning the results.

    PROCESS MODEL

    Approach #1: Process per DBMS WorkerApproach #2: Thread per DBMS WorkerApproach #3: Embedded DBMS

    PROCESS MODEL

    Approach #1: Process per DBMS Worker
    Approach #2: Thread per DBMS Worker Approach #3: Embedded DBMS
    Most Common

    PROCESS PER WORKER

    Each worker is a separate OS process.
    $\longrightarrow$ Relies on the OS dispatcher. $\longrightarrow$ Use shared- memory for global data structures. $\longrightarrow$ A process crash does not take down the entire system. $\longrightarrow$ Examples: IBM DB2, Postgres, Oracle

    IBM. DB2.

    ORACLE
    PostgreSQL

    PROCESS PER WORKER

    Each worker is a separate OS process.
    $\longrightarrow$ Relies on the OS dispatcher. $\longrightarrow$ Use shared- memory for global data structures. $\longrightarrow$ A process crash does not take down the entire system. $\longrightarrow$ Examples: IBM DB2, Postgres, Oracle

    IBM. DB2.

    ORACLE

    PROCESS PER WORKER

    Each worker is a separate OS process.
    $\longrightarrow$ Relies on the OS dispatcher. $\longrightarrow$ Use shared- memory for global data structures. $\longrightarrow$ A process crash does not take down the entire system. $\longrightarrow$ Examples: IBM DB2, Postgres, Oracle

    IBM. DB2.

    ORACLE

    PROCESS PER WORKER

    Each worker is a separate OS process.
    $\longrightarrow$ Relies on the OS dispatcher. $\longrightarrow$ Use shared- memory for global data structures. $\longrightarrow$ A process crash does not take down the entire system. $\longrightarrow$ Examples: IBM DB2, Postgres, Oracle

    IBM. DB2.

    THREAD PER WORKER

    Single process with multiple worker threads. $\rightarrow$ DBMS (mostly) manages its own scheduling. $\rightarrow$ May or may not use a dispatcher thread. $\rightarrow$ Thread crash (may) kill the entire system. $\rightarrow$ Examples: MSSQL, MySQL, DB2, Oracle (2014) Almost every DBMS created in the last 20 years!
    ORACLE

    THREAD PER WORKER

    Single process with multiple worker threads. $\rightarrow$ DBMS (mostly) manages its own scheduling. $\rightarrow$ May or may not use a dispatcher thread. $\rightarrow$ Thread crash (may) kill the entire system. $\rightarrow$ Examples: MSSQL, MySQL, DB2, Oracle (2014) Almost every DBMS created in the last 20 years!
    ORACLE

    THREAD PER WORKER

    Single process with multiple worker threads. $\rightarrow$ DBMS (mostly) manages its own scheduling. $\rightarrow$ May or may not use a dispatcher thread. $\rightarrow$ Thread crash (may) kill the entire system. $\rightarrow$ Examples: MSSQL, MySQL, DB2, Oracle (2014) Almost every DBMS created in the last 20 years!
    ORACLE

    THREAD PER WORKER

    Single process with multiple worker threads. $\rightarrow$ DBMS (mostly) manages its own scheduling. $\rightarrow$ May or may not use a dispatcher thread. $\rightarrow$ Thread crash (may) kill the entire system. $\rightarrow$ Examples: MSSQL, MySQL, DB2, Oracle (2014) Almost every DBMS created in the last 20 years!
    ORACLE

    EMBEDDED DBMS

    DBMS runs inside the same address space as the application. Application is (primarily) responsible for threads and scheduling.
    The application may support outside connections. $\longrightarrow$ Examples: BerkeleyDB, SQLite, RocksDB, LevelDB
    RocksDB
    DuckDB
    YottaDB
    WIREDTIGER
    SPLINTERDB
    bitcask

    EMBEDDED DBMS

    DBMS runs inside the same address space as the application. Application is (primarily) responsible for threads and scheduling.
    The application may support outside connections. $\longrightarrow$ Examples: BerkeleyDB, SQLite, RocksDB, LevelDB
    RocksDB
    DuckDB
    levelDB
    YottaDB
    WIREDTIGER
    SPLINTERDB
    bitcask

    EMBEDDED DBMS

    DBMS runs inside the same address space as the application. Application is (primarily) responsible for threads and scheduling.
    The application may support outside connections. $\rightarrow$ Examples: BerkeleyDB, SQLite, RocksDB, LevelDB
    RocksDB
    DuckDB
    YottaDB
    WIREDTIGER
    SPLINTERDB
    bitcask

    SCHEDULING

    For each query plan, the DBMS decides where, when, and how to execute it.
    $\rightarrow$ How many tasks should it use? $\rightarrow$ How many CPU cores should it use? $\rightarrow$ What CPU core should the tasks execute on? $\rightarrow$ Where should a task store its output?
    The DBMS nearly always knows more than the OS.

    PROCESS MODELS

    Advantages of a multi- threaded architecture: $\rightarrow$ Less overhead per context switch. $\rightarrow$ Do not have to manage shared memory.
    The thread per worker model does not mean that the DBMS supports intra- query parallelism.
    DBMS from the last 15 years use native OS threads unless they are Redis or Postgres forks.

    PARALLEL EXECUTION

    The DBMS executes multiple tasks simultaneously to improve hardware utilization. $\longrightarrow$ Active tasks do not need to belong to the same query. $\longrightarrow$ High- level approaches do not vary on whether the DBMS is multi- threaded, multi- process, or multi- node.
    Approach #1: Inter- Query Parallelism Approach #2: Intra- Query Parallelism

    INTER-QUERY PARALLELISM

    Improve overall performance by allowing multiple queries to execute simultaneously. $\rightarrow$ Most DBMSs use a simple first- come, first- served policy.
    If queries are read- only, then this requires almost no explicit coordination between the queries. $\rightarrow$ Buffer pool can handle most of the sharing if necessary.
    If multiple queries are updating the database at the same time, then this is tricky to do correctly...

    INTER-QUERY PARALLELISM

    Improve overall performance by allowing multiple queries to execute simultaneously. $\rightarrow$ Most DBMSs use a simple first- come, first- served policy.
    If queries are read- only, then this requires almost no explicit coordination between the queries. $\rightarrow$ Buffer pool can handle most of the sharing if necessary.
    If multiple queries are updating the database at the same time, then this is tricky to do correctly...

    INTRA-QUERY PARALLELISM

    Improve the performance of a single query by executing its operators in parallel. $\longrightarrow$ Think of the organization of operators in terms of a producer/consumer paradigm.
    Approach #1: Intra- Operator (Horizontal) Approach #2: Inter- Operator (Vertical)
    These techniques are not mutually exclusive.
    There are parallel versions of every operator.
    $\longrightarrow$ Can either have multiple threads access centralized data structures or use partitioning to divide work up.

    PARALLEL GRACE HASH JOIN

    Use a separate worker to perform the join for each level of buckets for R and S after partitioning.

    PARALLEL GRACE HASH JOIN

    Use a separate worker to perform the join for each level of buckets for R and S after partitioning.

    INTRA-QUERY PARALLELISM

    Approach #1: Intra- Operator (Horizontal)
    Approach #2: Inter- Operator (Vertical)
    Approach #3: Bushy

    INTRA-QUERY PARALLELISM

    Approach #1: Intra- Operator (Horizontal) Most CommonApproach #2: Inter- Operator (Vertical) Less CommonApproach #3: Bushy Higher- end Systems

    INTRA-OPERATOR PARALLELISM

    Approach #1: Intra-Operator (Horizontal)

    $\rightarrow$ Operators are decomposed into independent instances that perform the same function on different subsets of data.
    The DBMS inserts an exchange operator into the query plan to coalesce/split results from multiple children/parent operators. $\rightarrow$ Postgres calls this "gather"

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    A1 A2 A3
    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTRA-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    EXCHANGE OPERATOR

    Exchange Type #1 - Gather

    $\rightarrow$ Combine the results from multiple workers into a single output stream.

    Exchange Type #2 - Distribute

    $\rightarrow$ Split a single input stream into multiple output streams.

    Exchange Type #3 - Repartition

    $\rightarrow$ Shuffle multiple input streams across multiple output streams. $\rightarrow$ Some DBMSs always perform this step after every pipeline (e.g., Dremel/BigQuery).

    INTER-OPERATOR PARALLELISM

    Approach #2: Inter-Operator (Vertical)

    $\rightarrow$ Operations are overlapped to pipeline data from one stage to the next without materialization. $\rightarrow$ Workers execute multiple operators from different segments of a query plan at the same time. $\rightarrow$ Still need exchange operators to combine intermediate results from segments.
    Also called pipelined parallelism.

    INTER-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTER-OPERATOR PARALLELISM

    for $r_1 \in$ outer: for $r_2 \in$ inner: emit(r1r2)
    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTER-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    INTER-OPERATOR PARALLELISM

    SELECT A.id, B.value FROM A JOIN B ON A.id = B.id WHERE A.value < 99 AND B.value > 100

    BUSHY PARALLELISM

    Approach #3: Bushy Parallelism

    $\rightarrow$ Hybrid of intra- and inter- operator parallelism where workers execute multiple operators from different segments of a query plan at the same time. $\rightarrow$ Still need exchange operators to combine intermediate results from segments.

    BUSHY PARALLELISM

    SELECT * FROM A JOIN B JOIN C JOIN D

    BUSHY PARALLELISM

    SELECT * FROM A JOIN B JOIN C JOIN D

    BUSHY PARALLELISM

    SELECT * FROM A JOIN B JOIN C JOIN D

    OBSERVATION

    Using additional processes/threads to execute queries in parallel won't help if the disk is always the main bottleneck.
    It can sometimes make the DBMS's performance worse if a worker is accessing different segments of the disk at the same time.

    I/O PARALLELISM

    Split the DBMS across multiple storage devices to improve disk bandwidth latency.
    Many different options that have trade- offs: $\begin{array}{rl} & \rightarrow \mathrm{M u l t i p l e D i s k s p e r D a t a b a s e}\ & \rightarrow \mathrm{O n e D a t a b a s e p e r D i s k}\ & \rightarrow \mathrm{O n e R e l a t i o n p e r D i s k}\ & \rightarrow \mathrm{S p l i t R e l a t i o n a c r o s s M u l t i p l e D i s k s} \end{array}$
    Some DBMSs support this natively. Others require admin to configure outside of DBMS.

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    File of 6 pages (logical view):
    Table (html):
    page 1page 2page 3page 4page 5page 6

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    File of 6 pages (logical view):
    Table (html):
    page 1page 2page 3page 4page 5page 6
    Physical layout of pages across disks

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    File of 6 pages (logical view):
    Striping (RAID 0)
    Physical layout of pages across disks

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    File of 6 pages (logical view):
    Physical layout of pages across disks

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    File of 6 pages (logical view):
    Mirroring (RAID 1)
    Physical layout of pages across disks

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    Hardware- based: I/O controller makes multiple physical devices appear as single logical device. $\rightarrow$ Transparent to DBMS (e.g., RAID).
    File of 6 pages (logical view):
    Mirroring (RAID 1)
    Physical layout of pages across disks

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    Hardware- based: I/O controller makes multiple physical devices appear as single logical device. $\rightarrow$ Transparent to DBMS (e.g., RAID).
    Software- based: DBMS manages erasure codes at the file/object level. $\rightarrow$ Faster and more flexible.
    File of 6 pages (logical view):
    Mirroring (RAID 1)
    Physical layout of pages across disks

    MULTI-DISK PARALLELISM

    Store data across multiple disks to improve performance + durability.
    Hardware- based: I/O controller makes multiple physical devices appear as single logical device. $\rightarrow$ Transparent to DBMS (e.g., RAID).
    Software- based: DBMS manages erasure codes at the file/object level. $\rightarrow$ Faster and more flexible.

    DATABASE PARTITIONING

    Some DBMSs allow you to specify the disk location of each individual database. $\rightarrow$ The buffer pool manager maps a page to a disk location.
    This is also easy to do at the filesystem level if the DBMS stores each database in a separate directory. $\rightarrow$ The DBMS recovery log file might still be shared if transactions can update multiple databases.

    PARTITIONING

    Split a single logical table into disjoint physical segments that are stored/managed separately.
    Partitioning should (ideally) be transparent to the application.
    $\rightarrow$ The application should only access logical tables and not have to worry about how things are physically stored.
    We will cover this further when we talk about distributed databases.

    CONCLUSION

    Parallel execution is important, which is why (almost) every major DBMS supports it.
    However, it is hard to get right. $\rightarrow$ Coordination Overhead $\rightarrow$ Scheduling $\rightarrow$ Concurrency Issues $\rightarrow$ Resource Contention

    NEXT CLASS

    Query Optimization $\rightarrow$ Logical vs Physical Plans $\rightarrow$ Search Space of Plans $\rightarrow$ Cost Estimation of Plans