CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-05-slides.pdf

Carnegie Mellon University

Database Systems

Database Storage: Tuple Organization

ADMINISTRIVIA

Homework #1 is due January 29th @ 11

Project #1 is due on February 9th @ 11

Project recitation on Monday, February 3rd, from 5- 6pm in GHC 4303.

PREVIOUSLY

We presented a disk- oriented architecture where the DBMS assumes that the primary storage location of the database is on non- volatile disk.

We then discussed a page- oriented storage scheme for organizing tuples across heap files.

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

SLOTTED PAGES

The most common layout scheme is called slotted pages.

The slot array maps "slots" to the tuples' starting position offsets.

The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.

Fixed- and Var- length Tuple Data

RECORD IDS

The DBMS assigns each logical tuple a unique record identifier that represents its physical location in the database.

$\longrightarrow$ File Id, Page Id, Slot # $\longrightarrow$ Most DBMSs do not store ids in tuple. $\longrightarrow$ SQLite uses ROWID as the true primary key and stores them as a hidden attribute.

Applications should never rely on these IDs to mean anything.

PostgreSQL CTID (6- bytes)

SQLite ROWID (8- bytes)

Microsoft SQL Server %%physloc%% (8- bytes)

ORACLE ROWID (10- bytes)

TUPLE-ORIENTED STORAGE

Insert a new tuple:

$\rightarrow$ Check page directory to find a page with a free slot. $\rightarrow$ Retrieve the page from disk (if not in memory). $\rightarrow$ Check slot array to find empty space in page that will fit.

Update an existing tuple using its record id:

$\rightarrow$ Check page directory to find location of page. $\rightarrow$ Retrieve the page from disk (if not in memory). $\rightarrow$ Find offset in page using slot array. $\rightarrow$ If new data fits, overwrite existing data. Otherwise, mark existing tuple as deleted and insert new version in a different page.

TUPLE-ORIENTED STORAGE

Problem #1: Fragmentation

$\rightarrow$ Pages are not fully utilized (unusable space, empty slots).

Problem #2: Useless Disk I/O

$\rightarrow$ DBMS must fetch entire page to update one tuple.

Problem #3: Random Disk I/O

$\rightarrow$ Worse case scenario when updating multiple tuples is that each tuple is on a separate page.

What if the DBMS cannot overwrite data in pages and could only create new pages?

$\rightarrow$ Examples: Some object stores, HDFS, Google Colossus

TODAY'S AGENDA

Log- Structured Storage Index- Organized Storage Data Representation

LOG-STRUCTURED STORAGE

Instead of storing tuples in pages and updating the in- place, the DBMS maintains a log that records changes to tuples.

$\rightarrow$ Each log entry represents a tuple PUT/DELETE operation. $\rightarrow$ Originally proposed as log- structure merge trees (LSM Trees) in 1996.

The DBMS applies changes to an in- memory data structure (MemTable) and then writes out the changes sequentially to disk (SSTable).

LOG-STRUCTURED STORAGE

MemTable

CMU- DB 15- 445(645 (Spring 2025)

LOG-STRUCTURED STORAGE

CMU- DB 15- 445(645 (Spring 2025)

LOG-STRUCTURED STORAGE

CMU- DB 15- 445(645 (Spring 2025)

LOG-STRUCTURED STORAGE

CMU- DB 15- 445(645 (Spring 2025)

LOG-STRUCTURED STORAGE

CMU- DB 15- 445(645 (Spring 2025)

LOG-STRUCTURED STORAGE

[ImageFootnote: CMU-DB 15-445/645 (Spring 2025)]

LOG-STRUCTURED STORAGE

Key- value storage that appends log records on disk to represent changes to tuples (PUT, DELETE).

$\longrightarrow$ Each log record must contain the tuple's unique identifier. $\longrightarrow$ Put records contain the tuple contents. $\longrightarrow$ Deletes marks the tuple as deleted.

As the application makes changes to the database, the DBMS appends log records to the end of the file without checking previous log records.

SSTable

LOG-STRUCTURED COMPACTION

Periodically compact SSTAbles to reduce wasted space and speed up reads. $\longrightarrow$ Only keep the "latest" values for each key using a sort- . merge algorithm.

LOG-STRUCTURED COMPACTION

Periodically compact SSTAbles to reduce wasted space and speed up reads. $\longrightarrow$ Only keep the "latest" values for each key using a sort- . merge algorithm.

LOG-STRUCTURED COMPACTION

Periodically compact SSTAbles to reduce wasted space and speed up reads. $\longrightarrow$ Only keep the "latest" values for each key using a sort- . merge algorithm.

DISCUSSION

Log- structured storage managers are more common today than in previous decades.

$\rightarrow$ This is partly due to the proliferation of RocksDB.

What are some downsides of this approach?

$\rightarrow$ Write- Amplification. $\rightarrow$ Compaction is expensive.

OBSERVATION

The two table storage approaches we've discussed so far rely on indexes to find individual tuples. $\rightarrow$ Such indexes are necessary because the tables are inherently unsorted.

But what if the DBMS could keep tuples sorted automatically using an index?

INDEX-ORGANIZED STORAGE

DBMS stores a table's tuples as the value of an index data structure.

$\rightarrow$ Still use a page layout that looks like a slotted page. $\rightarrow$ Tuples are typically sorted in page based on key.

$\mathrm{B}+$ Tree pays maintenance costs upfront, whereas LSMs pay for it later.

MySQL

ORACLE

SQLServer

INDEX-ORGANIZED STORAGE

DBMS stores a table's tuples as the value of an index data structure.

MySQL

$\rightarrow$ Still use a page layout that looks like a slotted page. $\rightarrow$ Tuples are typically sorted in page based on key.

$\mathrm{B} + \mathrm{T}$ ree pays maintenance costs upfront, whereas LSMs pay for it later.

INDEX-ORGANIZED STORAGE

DBMS stores a table's tuples as the value of an index data structure.

$\rightarrow$ Still use a page layout that looks like a slotted page. $\rightarrow$ Tuples are typically sorted in page based on key.

$\mathrm{B}+$ Tree pays maintenance costs upfront, whereas LSMs pay for it later.

ORACLE

INDEX-ORGANIZED STORAGE

DBMS stores a table's tuples as the value of an index data structure.

$\longrightarrow$ Still use a page layout that looks like a slotted page. $\longrightarrow$ Tuples are typically sorted in page based on key.

$\mathrm{B}+$ Tree pays maintenance costs upfront, whereas LSMs pay for it later.

INDEX-ORGANIZED STORAGE

DBMS stores a table's tuples as the value of an index data structure.

$\rightarrow$ Still use a page layout that looks like a slotted page. $\rightarrow$ Tuples are typically sorted in page based on key.

$\mathrm{B}+$ Tree pays maintenance costs upfront, whereas LSMs pay for it later.

INDEX-ORGANIZED STORAGE

DBMS stores a table's tuples as the value of an index data structure.

$\rightarrow$ Still use a page layout that looks like a slotted page. $\rightarrow$ Tuples are typically sorted in page based on key.

$\mathrm{B}+$ Tree pays maintenance costs upfront, whereas LSMs pay for it later.

TUPLE STORAGE

A tuple is essentially a sequence of bytes prefixed with a header that contains meta- data about it.

It is the job of the DBMS to interpret those bytes into attribute types and values.

The DBMS's catalogs contain the schema information about tables that the system uses to figure out the tuple's layout.

DATA LAYOUT

CREATE TABLE foo ( id INT PRIMARY KEY, value BIGINT);

unsigned char[]

DATA LAYOUT

CREATE TABLE foo ( id INT PRIMARY KEY, value BIGINT);

unsigned char[]

Table (html):

header

value

DATA LAYOUT

CREATE TABLE foo ( id INT PRIMARY KEY, value BIGINT);

unsigned char[]

header id value

DATA LAYOUT

CREATE TABLE foo ( id INT PRIMARY KEY, value BIGINT);

unsigned char[]

DATA LAYOUT

CREATE TABLE foo ( id INT PRIMARY KEY, value BIGINT);

unsigned char[]

header

value

reinterpret_cast<int32_t*>(address)

WORD-ALIGNED TUPLES