CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-25-slides.pdf

Carnegie Mellon University

Database Systems

Final Review & Systems Potpourri

ADMINISTRIVIA

Final Exam is on Monday, April 28, 2025, from 5:30pm- 8

$\longrightarrow$ Early exam will not be offered. Do not make travel plans. $\longrightarrow$ Material: Lecture 12 - Lecture 24. $\longrightarrow$ You can use the full 3 hours, though the exam is meant to be done in $\sim 2$ hours.

Last day to submit P4 (with late days and penalty) is April 30 @ 11

Course Evals: Would like your feedback.

$\longrightarrow$ https://cmu.smartevals.com $\longrightarrow$ https://www.ugrad.cs.cmu.edu/ta/S25/feedback/

OFFICE HOURS

Jignesh:

$\rightarrow$ Thursday April 24th @ noon- 2

(GHC 9103)

All other TAs will have their office hours up to and including Saturday April 26th

FINAL EXAM

Where: Scaife Hall 105 and Scaife Hall 234. When: Monday, April 28, 2025, 5:30pm- 8

Come to Scaife Hall 105 first. Then, look at your seating assignment, which may assign you to Scaife Hall 234.

https://15445. courses.cs.cmu.edu/spring2025/final- guide.html

FINAL EXAM

What to bring:

$\rightarrow$ CMU ID $\rightarrow$ Pencil + Eraser (!!!) $\rightarrow$ Calculator (cellphone is okay) $\rightarrow$ One 8.5x11" page of handwritten notes (double- sided)

STUFF BEFORE MID-TERM

SQL

Buffer Pool ManagementData Structures (Hash Tables, B+Trees)Storage ModelsQuery Processing ModelsInter- Query ParallelismBasic Understanding of BusTub Internals

JOIN ALGORITHMS

Join Algorithms

$\longrightarrow$ Naive Nested Loops $\longrightarrow$ Block Nested Loops $\longrightarrow$ Index Nested Loops $\longrightarrow$ Sort- Merge $\longrightarrow$ Hash Join: Simple, Partitioned, Hybrid Hash $\longrightarrow$ Optimization using Bloom Filters $\longrightarrow$ Cost functions

QUERY EXECUTION

Execution Models

$\rightarrow$ Iterator $\rightarrow$ Materialized $\rightarrow$ Vector / Batch

Plan Processing: Push vs. Pull

Access Methods $\rightarrow$ Sequential Scan and various optimization $\rightarrow$ Index Scan, including multi- index scan $\rightarrow$ Issues with update queries

Expression Evaluation

QUERY EXECUTION

Process Model

Parallel Execution

$\longrightarrow$ Inter Query Parallelism $\longrightarrow$ Intra Query Parallelism: Intra- Operator: horizontal, vertical, and bushy Parallel hash join, Exchange operator $\longrightarrow$ Intra Query Parallelism: Inter- Operator, aka. pipelined parallelism

IO Parallelism

QUERY OPTIMIZATION

Heuristics

$\longrightarrow$ Predicate Pushdown $\longrightarrow$ Projection Pushdown $\longrightarrow$ Nested Sub- Queries: Rewrite and Decompose

Statistics

$\longrightarrow$ Cardinality Estimation $\longrightarrow$ Histograms

Cost- based search $\longrightarrow$ Bottom- up vs. Top- Down

TRANSACTIONS

ACID

Conflict Serializability: $\rightarrow$ How to check for correctness? $\rightarrow$ How to check for equivalence? View Serializability $\rightarrow$ Difference with conflict serializability Isolation Levels / Anomalies

TRANSACTIONS

Two- Phase Locking $\longrightarrow$ Strong Strict 2PL $\longrightarrow$ Cascading Aborts Problem $\longrightarrow$ Deadlock Detection & Prevention

Multiple Granularity Locking

$\longrightarrow$ Intention Locks $\longrightarrow$ Understanding performance trade- offs $\longrightarrow$ Lock Escalation (i.e., when is it allowed)

TRANSACTIONS

Optimistic Concurrency Control

$\longrightarrow$ Read Phase $\longrightarrow$ Validation Phase (Backwards vs. Forwards) $\longrightarrow$ Write Phase

Multi- Version Concurrency Control

$\longrightarrow$ Version Storage / Ordering $\longrightarrow$ Garbage Collection $\longrightarrow$ Index Maintenance

CRASH RECOVERY

Buffer Pool Policies: $\rightarrow$ STEAL vs. NO- STEAL $\rightarrow$ FORCE vs. NO- FORCE

Shadow Paging

Write- Ahead Logging $\rightarrow$ How it relates to buffer pool management $\rightarrow$ Logging Schemes (Physical vs. Logical)

CRASH RECOVERY

Checkpoints $\rightarrow$ Non- Fuzzy vs. Fuzzy

ARIES Recovery

$\rightarrow$ Dirty Page Table (DPT) $\rightarrow$ Active Transaction Table (ATT) $\rightarrow$ Analyze, Redo, Undo phases $\rightarrow$ Log Sequence Numbers $\rightarrow$ CLRs

DISTRIBUTED DATABASES

System Architectures Replication Schemes Partitioning Schemes Two- Phase Commit Paxos Distributed Query Execution Distributed Join Algorithms Semi- Join Optimization Cloud Architectures

TOPICS NOT ON EXAM!

Flash TalksSeminar TalksDetails of specific database systems (e.g., Postgres)

GOOGLE SPANNER

Google's geo- replicated DBMS (>2011) Schematized, semi- relational data model. Decentralized shared- disk architecture. Log- structured on- disk storage. Concurrency Control: $\rightarrow$ Strict ZPL + MVCC + Multi- Paxos + 2PC $\rightarrow$ Externally consistent global write- transactions with synchronous replication. $\rightarrow$ Lock- free read- only transactions.

SPANNER: CONCURRENCY CONTROL

MVCC + Strict 2PL with Wound- Wait Deadlock Prevention

DBMS ensures ordering through globally unique timestamps generated from atomic clocks and GPS devices.

Buffer writes in the client, and these are sent to the server at commit time.

Database is broken up into tablets (partitions): $\rightarrow$ Use Paxos to elect leader in tablet group. $\rightarrow$ Use 2PC for txns that span tablets.

SPANNER TABLES

SPANNER: TRANSACTION ORDERING

DBMS orders transactions based on physical "wall- clock" time.

$\rightarrow$ This is necessary to guarantee strict serializability. $\rightarrow$ If $\mathsf{T}_1$ finishes before $\mathsf{T}_2$ , then $\mathsf{T}_2$ should see the result of $\mathsf{T}_1$ .

Each Paxos group decides in what order transactions should be committed according to the timestamps.

$\rightarrow$ If $\mathsf{T}_1$ commits at $\mathsf{time}_1$ and $\mathsf{T}_2$ starts at $\mathsf{time}_2 > \mathsf{time}_1$ , then $\mathsf{T}_1$ 's timestamp should be less than $\mathsf{T}_2$ 's.

SPANNER TRUETIME

The DBMS maintains a global wall- clock time across all data centers with bounded uncertainty. Timestamps are intervals, not single values

SPANNER TRUETIME

The DBMS maintains a global wall- clock time across all data centers with bounded uncertainty. Timestamps are intervals, not single values

SPANNER: TRUETIME

Each data center has GPS and atomic clocks $\rightarrow$ These two provide fine- grained clock synchronization down to a few milliseconds. $\rightarrow$ Every 30 seconds, there is a maximum 7 ms difference.

Multiple sync daemons per data center $\rightarrow$ GPS and atomic clocks can fail in various conditions. $\rightarrow$ Sync daemons talk to each other within a data center as well as across data centers.

GOOGLE BIGQUERY (2011)

Originally developed as "Dremel" in 2006 as a side- project for analyzing data artifacts generated from other tools.

$\rightarrow$ The "interactive" goal means that they want to support ad hoc queries on in- situ data files. $\rightarrow$ Did not support joins in the first version.

Rewritten in the late 2010s to shared- disk architecture built on top of GFS.

Released as public commercial product (BigQuery) in 2012.

BIGQUERY: OVERVIEW

Shared- Disk / Disaggregated StorageVectorized Query ProcessingShuffle- based Distributed Query ExecutionColumnar Storage $\rightarrow$ Zone Maps / Filters $\rightarrow$ Dictionary + RLE Compression $\rightarrow$ Only Allows "Search" Inverted IndexesHash Joins OnlyHeuristic Optimizer + Adaptive Optimizations

BIGQUERY: OVERVIEW

Shared- Disk / Disaggregated Storage Vectorized Query Processing

Shuffle- based Distributed Query Execution

Columnar Storage $\rightarrow$ Zone Maps / Filters $\rightarrow$ Dictionary + RLE Compression $\rightarrow$ Only Allows "Search" Inverted IndexesHash Joins OnlyHeuristic Optimizer + Adaptive Optimizations

BIGQUERY: IN-MEMORY SHUFFLE

The shuffle phases represent checkpoints in a query's lifecycle where that the coordinator makes sure that all tasks are completed.

Fault Tolerance / Straggler Avoidance:

$\rightarrow$ If a worker does not produce a task's results within a deadline, the coordinator speculatively executes a redundant task.

Dynamic Resource Allocation:

$\rightarrow$ Scale up / down the number of workers for the next stage depending size of a stage's output.

BIGQUERY: IN-MEMORY SHUFFLE

Distributed File System

BIGQUERY: IN-MEMORY SHUFFLE

Distributed File System

BIGQUERY: IN-MEMORY SHUFFLE

Distributed File System

BIGQUERY: IN-MEMORY SHUFFLE

BIGQUERY: DYNAMIC REPARTITIONING

BigQuery dynamically load balances and adjusts intermediate result partitioning to adapt to data skew.