Carnegie Mellon University
Database Systems
Database Storage: Files & Pages
ADMINISTRIVIA
Project #0 is due January 26th @ 11
Homework #1 is due January 29th @ 11
Project #1 will be released on January 22nd
LAST CLASS
We now understand what a database looks like at a logical level and how to write queries to read/write data (e.g., using SQL).
We will next learn how to build software that manages a database (i.e., a DBMS).
COURSE OUTLINE
Relational Databases Storage Query Execution Concurrency Control Database Recovery Distributed Databases Potpourri
COURSE OUTLINE
Relational DatabasesStorageQuery ExecutionConcurrency ControlDatabase RecoveryDistributed DatabasesPotpourri
Table (html):
| Query Planning |
| Operator Execution |
| Access Methods |
| Buffer Pool Manager |
| Disk Manager |
COURSE OUTLINE
Relational DatabasesStorageQuery ExecutionConcurrency ControlDatabase RecoveryDistributed DatabasesPotpourri
COURSE OUTLINE
Relational Databases
Storage
Query Execution
Concurrency Control
Database Recovery
Distributed Databases
Potpourri
Query Planning
Operator Execution
Access Methods
Buffer Pool Manager
Disk Manager
TODAY'S AGENDA
BackgroundFile StoragePage LayoutTuple Layout
DISK-BASED ARCHITECTURE
The DBMS assumes that the primary storage location of the database is on non- volatile disk.
The DBMS's components manage the movement of data between non- volatile and volatile storage.
STORAGE HIERARCHY
STORAGE HIERARCHY
STORAGE HIERARCHY
STORAGE HIERARCHY
STORAGE HIERARCHY
STORAGE HIERARCHY
STORAGE HIE
Intel kills the remnants of Optane memory
The speed- boosting storage tech was already on the ropes.
By Michael Crider Staff Writer, PCWorld |JUL29,20226
PDT
If you haven't built a super- high- end workstation in a while, you might not have heard of Intel's Optane memory caching tech. Optane also powered ultra- fast SSDs for consumers and businesses alike. Not that it matters much now. After a disastrous second- quarter earnings call in which it missed expected revenue by billions of dollars, the company announced its plans to end its Optane memory business entirely.
STORAGE HIERARCHY
STORAGE HIERARCHY
ACCESS TIMES
Latency Numbers Every Programmer Should Know
Table (html):
| 1 ns L1 Cache Ref |
| 4 ns L2 Cache Ref |
| 100 ns DRAM |
| 16,000 ns SSD |
| 2,000,000 ns HDD |
| ~50,000,000 ns Network Storage |
| 1,000,000,000 ns Tape Archives |
ACCESS TIMES
Latency Numbers Every Programmer Should Know
Table (html):
| 1 ns | L1 Cache Ref | 1 sec |
| 4 ns | L2 Cache Ref | 4 sec |
| 100 ns | DRAM | 100 sec |
| 16,000 ns | SSD | 4.4 hours |
| 2,000,000 ns | HDD | 3.3 weeks |
| ~50,000,000 ns | Network Storage | 1.5 years |
| 1,000,000,000 ns | Tape Archives | 31.7 years |
SEQUENTIAL VS. RANDOM ACCESS
Random access on non- volatile storage is almost always much slower than sequential access.
DBMS will want to maximize sequential access. $\rightarrow$ Algorithms try to reduce number of writes to random pages so that data is stored in contiguous blocks. $\rightarrow$ Allocating multiple pages at the same time is called an extent.
SYSTEM DESIGN GOALS
Allow the DBMS to manage databases that exceed the amount of memory available.
Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation.
Random access on disk is usually much slower than sequential access, so the DBMS will want to maximize sequential access.
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DISK-ORIENTED DBMS
DATABASE STORAGE
Problem #1: How the DBMS represents the database in files on disk.
← Today
Problem #2: How the DBMS manages its memory and moves data back- and- forth from disk.
FILE STORAGE
The DBMS stores a database as one or more files on disk typically in a proprietary format.
$\rightarrow$ The OS does not know anything about the contents of these files. $\rightarrow$ We will discuss portable file formats next week...
Early systems in the 1980s used custom filesystems on raw block storage.
$\rightarrow$ Some enterprise DBMSs still support this. $\rightarrow$ Most newer DBMSs do not do this.
FILE STORAGE
STORAGE MANAGER
The storage manager is responsible for maintaining a database's files.
$\rightarrow$ Some do their own scheduling for reads and writes to improve spatial and temporal locality of pages.
It organizes the files as a collection of pages.
$\rightarrow$ Tracks data read/written to pages. $\rightarrow$ Tracks the available space.
A DBMS typically does not maintain multiple copies of a page on disk.
$\rightarrow$ Assume this happens above/below storage manager.
DATABASE PAGES
A page is a fixed- size block of data. $\longrightarrow$ It can contain tuples, meta- data, indexes, log records... $\longrightarrow$ Most systems do not mix page types. $\longrightarrow$ Some systems require a page to be self- contained.
Each page is given a unique identifier (page ID). $\longrightarrow$ A page ID could be unique per DBMS instance, per database, or per table. $\longrightarrow$ The DBMS uses an indirection layer to map page IDs to physical locations.
DATABASE PAGES
There are three different notions of "pages" in a DBMS:
$\longrightarrow$ Hardware Page (usually 4KB) $\longrightarrow$ OS Page (usually 4KB, x64 2MB/1GB) $\longrightarrow$ Database Page (512B- 32KB)
A hardware page is the largest block of data that the storage device can guarantee failsafe writes.
DBMSs that specialize in read- only workloads have larger page sizes.
DATABASE PAGES
There are three different notions of "pages" in a DBMS:
$\longrightarrow$ Hardware Page (usually 4KB) $\longrightarrow$ OS Page (usually 4KB, x64 2MB/1GB) $\longrightarrow$ Database Page (512B- 32KB)
A hardware page is the largest block of data that the storage device can guarantee failsafe writes.
DBMSs that specialize in read- only workloads have larger page sizes.
DATABASE PAGES
There are three different notions of "pages" in a DBMS:
$\longrightarrow$ Hardware Page (usually 4KB) $\longrightarrow$ OS Page (usually 4KB, x64 2MB/1GB) $\longrightarrow$ Database Page (512B- 32KB)
A hardware page is the largest block of data that the storage device can guarantee failsafe writes.
DBMSs that specialize in read- only workloads have larger page sizes.
Default DB Page Sizes
PAGE STORAGE ARCHITECTURE
Different DBMSs manage pages in files on disk in different ways.
$\longrightarrow$ Heap File Organization $\longrightarrow$ Tree File Organization $\longrightarrow$ Sequential / Sorted File Organization (ISAM) $\longrightarrow$ Hashing File Organization
At this point in the hierarchy, we do not need to know anything about what is inside of the pages.
PAGE STORAGE ARCHITECTURE
Different DBMSs manage pages in files on disk in different ways.
$\longrightarrow$ Heap File Organization $\longrightarrow$ Tree File Organization $\longrightarrow$ Sequential / Sorted File Organization (ISAM) $\longrightarrow$ Hashing File Organization
At this point in the hierarchy, we do not need to know anything about what is inside of the pages.
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
Get Page #23
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
HEAP FILE
A heap file is an unordered collection of pages with tuples that are stored in random order. $\rightarrow$ Create / Get / Write / Delete Page $\rightarrow$ Must also support iterating over all pages.
Need additional meta- data to track location of files and free space availability.
HEAP FILE: PAGE DIRECTORY
The DBMS maintains special pages that tracks the location of data pages in the database files.
$\longrightarrow$ One entry per database object. $\longrightarrow$ Must make sure that the directory pages are in sync with the data pages.
DBMS also keeps meta- data about pages' contents:
$\longrightarrow$ Amount of free space per page. $\longrightarrow$ List of free / empty pages. $\longrightarrow$ Page type (data vs. meta- data).
HEAP FILE: PAGE DIRECTORY
The DBMS maintains special pages that tracks the location of data pages in the database files.
$\longrightarrow$ One entry per database object. $\longrightarrow$ Must make sure that the directory pages are in sync with the data pages.
DBMS also keeps meta- data about pages' contents:
$\longrightarrow$ Amount of free space per page. $\longrightarrow$ List of free / empty pages. $\longrightarrow$ Page type (data vs. meta- data).
TODAY'S AGENDA
File Storage Page Layout Tuple Layout
PAGE HEADER
Every page contains a header of meta- data about the page's contents.
$\longrightarrow$ Page Size $\longrightarrow$ Checksum $\longrightarrow$ DBMS Version $\longrightarrow$ Transaction Visibility $\longrightarrow$ Compression / Encoding Meta- data $\longrightarrow$ Schema Information $\longrightarrow$ Data Summary / Sketches
Some systems require pages to be self- contained (e.g., Oracle).
Page
Table (html):
PAGE LAYOUT
For any page storage architecture, we now need to decide how to organize the data inside of the page. $\rightarrow$ We are still assuming that we are only storing tuples in a row- oriented storage model.
Approach #1: Tuple- oriented Storage Approach #2: Log- structured Storage Approach #3: Index- organized Storage
PAGE LAYOUT
For any page storage architecture, we now need to decide how to organize the data inside of the page. $\rightarrow$ We are still assuming that we are only storing tuples in a row- oriented storage model.
Approach #1: Tuple- oriented StorageApproach #2: Log- structured StorageApproach #3: Index- organized Storage
PAGE LAYOUT
For any page storage architecture, we now need to decide how to organize the data inside of the page. $\rightarrow$ We are still assuming that we are only storing tuples in a row- oriented storage model.
Approach #1: Tuple- oriented Storage Approach #2: Log- structured Storage Approach #3: Index- organized Storage
PAGE LAYOUT
For any page storage architecture, we now need to decide how to organize the data inside of the page. $\rightarrow$ We are still assuming that we are only storing tuples in a row- oriented storage model.
Approach #1: Tuple- oriented Storage
Today
Approach #2: Log- structured Storage
Approach #3: Index- organized Storage
PAGE LAYOUT
For any page storage architecture, we now need to decide how to organize the data inside of the page. $\rightarrow$ We are still assuming that we are only storing tuples in a row- oriented storage model.
Approach #1: Tuple- oriented Storage
Approach #2: Log- structured Storage Approach #3: Index- organized Storage
Lecture #4
TUPLE-ORIENTED STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.
Page
Table (html):
TUPLE-ORIENTED STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.
Page
Table (html):
| Num Tuples = 3 |
| Tuple #1 |
| Tuple #2 |
| Tuple #3 |
|
TUPLE-ORIENTED STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end. $\rightarrow$ What happens if we delete a tuple?
Page
Table (html):
| Num Tuples = 3 |
| Tuple #1 |
| Tuple #2 |
| Tuple #3 |
|
TUPLE-ORIENTED STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end. $\rightarrow$ What happens if we delete a tuple?
Page
Table (html):
| Num Tuples = 2 |
| Tuple #1 |
|
| Tuple #3 |
|
TUPLE-ORIENTED STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end. $\rightarrow$ What happens if we delete a tuple?
Page
Table (html):
| Num Tuples = 3 |
| Tuple #1 |
| Tuple #4 |
| Tuple #3 |
|
TUPLE-ORIENTED STORAGE
How to store tuples in a page?
Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuple to the end.
$\rightarrow$ What happens if we delete a tuple? $\rightarrow$ What happens if we have a variable- length attribute?
Page
Table (html):
| Num Tuples = 3 |
| Tuple #1 |
| Tuple #4 |
| Tuple #3 |
|
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\rightarrow$ The # of used slots $\rightarrow$ The offset of the starting location of the last slot used.
Table (html):
| Header |
|
| Tuple #4 | Tuple #3 |
| Tuple #2 | Tuple #1 |
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
SLOTTED PAGES
The most common layout scheme is called slotted pages.
The slot array maps "slots" to the tuples' starting position offsets.
The header keeps track of: $\longrightarrow$ The # of used slots $\longrightarrow$ The offset of the starting location of the last slot used.
Fixed- and Var- length Tuple Data
RECORD IDS
The DBMS assigns each logical tuple a unique record identifier that represents its physical location in the database.
$\longrightarrow$ File Id, Page Id, Slot # $\longrightarrow$ Most DBMSs do not store ids in tuple. $\longrightarrow$ SQLite uses ROWID as the true primary key and stores them as a hidden attribute.
Applications should never rely on these IDs to mean anything.
RECORD IDS
The DBMS assigns each logical tuple a unique record identifier that represents its physical location in the database.
$\longrightarrow$ File Id, Page Id, Slot # $\longrightarrow$ Most DBMSs do not store ids in tuple. $\longrightarrow$ SQLite uses ROWID as the true primary key and stores them as a hidden attribute.
Applications should never rely on these IDs to mean anything.
PostgreSQL CTID (6- bytes)
Microsoft SQLServer %%physloc%% (8- bytes)
ORACLE ROWID (10- bytes)
TODAY'S AGENDA
File Storage Page Layout Tuple Layout
TUPLE LAYOUT
A tuple is essentially a sequence of bytes. $\rightarrow$ These bytes do not have to be contiguous.
It is the job of the DBMS to interpret those bytes into attribute types and values.
TUPLE HEADER
Each tuple is prefixed with a header that contains meta- data about it. $\rightarrow$ Visibility info (concurrency control) $\rightarrow$ Bit Map for NULL values.
We do not need to store meta- data about the schema.
Tuple
Table (html):
TUPLE DATA
Attributes are typically stored in the order that you specify them when you create the table.
This is done for software engineering reasons (i.e., simplicity).
However, it might be more efficient to lay them out differently.
Tuple
Table (html):
CREATE TABLE foo ( a INT PRIMARY KEY, b INT NOT NULL, c INT, d DOUBLE, e FLOAT);
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\rightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\rightarrow$ Can make updates more expensive.
CREATE TABLE foo ( ) a INT PRIMARY KEY, b INT NOT NULL, ) ; CREATE TABLE bar ( ) c INT PRIMARY KEY, a INT REFERENCES foo (a)
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\rightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\rightarrow$ Can make updates more expensive.
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\longrightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\longrightarrow$ Can make updates more expensive.
foo
Table (html):
bar
Table (html):
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\longrightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\longrightarrow$ Can make updates more expensive.
SELECT * FROM foo JOIN bar ON foo.a = bar.a;
foo
Table (html):
bar
Table (html):
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\longrightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\longrightarrow$ Can make updates more expensive.
foo
SELECT * FROM foo JOIN bar ON foo.a = bar.a;
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\rightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\rightarrow$ Can make updates more expensive.
SELECT * FROM foo JOIN bar ON foo.a = bar.a;
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\longrightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\longrightarrow$ Can make updates more expensive.
Not a new idea.
$\longrightarrow$ IBM System R did this in the 1970s. $\longrightarrow$ Several NoSQL DBMSs do this without calling it physical denormalization.
DENORMALIZED TUPLE DATA
DBMS can physically denormalize (e.g., "pre- join") related tuples and store them together in the same page. $\longrightarrow$ Potentially reduces the amount of I/O for common workload patterns. $\longrightarrow$ Can make updates more expensive.
Not a new idea.
$\longrightarrow$ IBM System R did this in the 1970s. $\longrightarrow$ Several NoSQL DBMSs do this without calling it physical denormalization.
CONCLUSION
Database is organized in pages. Different ways to track pages. Different ways to store pages. Different ways to store tuples.
NEXT CLASS
Log- Structured Storage Index- Organized Storage Value Representation Catalogs