CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-06-slides.pdf

Carnegie Mellon University

Database Systems

Storage Models & Data Compression

ADMINISTRIVIA

Project #1 is due on February 9^th @ 11

Project recitation on Monday, February 3rd, from 5- 6pm in GHC 4303.

Homework #2 is due February 9^th @ 11

LAST CLASS

We discussed storage architecture alternatives to tuple- oriented scheme. $\rightarrow$ Log- structured storage $\rightarrow$ Index- organized storage

These approaches are ideal for write- heavy (INSERT/UPDATE/DELETE) workloads.

But the most important query for some workloads may be read (SELECT) performance...

TODAY'S AGENDA

Database WorkloadsStorage ModelsData Compression

DATABASE WORKLOADS

On-Line Transaction Processing (OLTP)

$\rightarrow$ Fast operations that only read/update a small amount of data each time.

On-Line Analytical Processing (OLAP)

$\rightarrow$ Complex queries that read a lot of data to compute aggregates.

Hybrid Transaction + Analytical Processing $\rightarrow$ OLTP + OLAP together on the same database instance

DATABASE WORKLOADS

Workload Focus

WIKIPEDIA EXAMPLE

OBSERVATION

The relational model does not specify that the DBMS must store all a tuple's attributes together in a single page.

This may not actually be the best layout for some workloads...

OLTP

On-line Transaction Processing:

$\rightarrow$ Simple queries that read/update a small amount of data that is related to a single entity in the database.

This is usually the kind of application that people build first.

SELECT P., R. FROM pages AS P INNER JOIN revisions AS R ON PLatest = R.revID WHERE P.pageID = ?

UPDATE useracct SET lastLogin = NOW(), hostname = ? WHERE userID = ?

INSERT INTO revisions VALUES (?, ?, ?, ?)

OLAP

On-line Analytical Processing:

$\rightarrow$ Complex queries that read large portions of the database spanning multiple entities.

You execute these workloads on the data you have collected from your OLTP application(s).

SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

STORAGE MODELS

A DBMS's storage model specifies how it physically organizes tuples on disk and in memory. $\rightarrow$ Can have different performance characteristics based on the target workload (OLTP vs. OLAP). $\rightarrow$ Influences the design choices of the rest of the DBMS.

Choice #1: N- ary Storage Model (NSM) Choice #2: Decomposition Storage Model (DSM) Choice #3: Hybrid Storage Model (PAX)

N-ARY STORAGE MODEL (NSM)

The DBMS stores (almost) all attributes for a single tuple contiguously in a single page. $\rightarrow$ Also commonly known as a row store

Ideal for OLTP workloads where queries are more likely to access individual entities and execute write- heavy workloads.

NSM database page sizes are typically some constant multiple of 4 KB hardware pages.

See Lecture #03

NSM: PHYSICAL ORGANIZATION