Carnegie Mellon University
Database Systems
Storage Models & Data Compression
ADMINISTRIVIA
Project #1 is due on February 9th @ 11
Project recitation on Monday, February 3rd, from 5- 6pm in GHC 4303.
Homework #2 is due February 9th @ 11
LAST CLASS
We discussed storage architecture alternatives to tuple- oriented scheme. $\rightarrow$ Log- structured storage $\rightarrow$ Index- organized storage
These approaches are ideal for write- heavy (INSERT/UPDATE/DELETE) workloads.
But the most important query for some workloads may be read (SELECT) performance...
TODAY'S AGENDA
Database WorkloadsStorage ModelsData Compression
DATABASE WORKLOADS
On-Line Transaction Processing (OLTP)
$\rightarrow$ Fast operations that only read/update a small amount of data each time.
On-Line Analytical Processing (OLAP)
$\rightarrow$ Complex queries that read a lot of data to compute aggregates.
Hybrid Transaction + Analytical Processing $\rightarrow$ OLTP + OLAP together on the same database instance
DATABASE WORKLOADS
Workload Focus
WIKIPEDIA EXAMPLE
OBSERVATION
The relational model does not specify that the DBMS must store all a tuple's attributes together in a single page.
This may not actually be the best layout for some workloads...
OLTP
On-line Transaction Processing:
$\rightarrow$ Simple queries that read/update a small amount of data that is related to a single entity in the database.
This is usually the kind of application that people build first.
SELECT P., R. FROM pages AS P INNER JOIN revisions AS R ON PLatest = R.revID WHERE P.pageID = ?
UPDATE useracct SET lastLogin = NOW(), hostname = ? WHERE userID = ?
INSERT INTO revisions VALUES (?, ?, ?, ?)
OLAP
On-line Analytical Processing:
$\rightarrow$ Complex queries that read large portions of the database spanning multiple entities.
You execute these workloads on the data you have collected from your OLTP application(s).
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
STORAGE MODELS
A DBMS's storage model specifies how it physically organizes tuples on disk and in memory. $\rightarrow$ Can have different performance characteristics based on the target workload (OLTP vs. OLAP). $\rightarrow$ Influences the design choices of the rest of the DBMS.
Choice #1: N- ary Storage Model (NSM) Choice #2: Decomposition Storage Model (DSM) Choice #3: Hybrid Storage Model (PAX)
N-ARY STORAGE MODEL (NSM)
The DBMS stores (almost) all attributes for a single tuple contiguously in a single page. $\rightarrow$ Also commonly known as a row store
Ideal for OLTP workloads where queries are more likely to access individual entities and execute write- heavy workloads.
NSM database page sizes are typically some constant multiple of 4 KB hardware pages.
See Lecture #03
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
[ImageCaption: Slot Array]
NSM: PHYSICAL ORGANIZATION
A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
NSM: OLTP EXAMPLE
SELECT * FROM useracct WHERE userName = ? AND userPass = ?
NSM: OLTP EXAMPLE
SELECT * FROM useracct WHERE userName = ? AND userPass = ?
NSM: OLTP EXAMPLE
SELECT * FROM useracct WHERE userName = ? AND userPass = ?
NSM: OLTP EXAMPLE
NSM: OLTP EXAMPLE
NSM: OLTP EXAMPLE
NSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
NSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
NSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
NSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
NSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
NSM: OLAP EXAMPLE
NSM: OLAP EXAMPLE
NSM: OLAP EXAMPLE
NSM: OLAP EXAMPLE
NSM: SUMMARY
Advantages
$\rightarrow$ Fast inserts, updates, and deletes. $\rightarrow$ Good for queries that need the entire tuple (OLTP). $\rightarrow$ Can use index- oriented physical storage for clustering.
Disadvantages
$\rightarrow$ Not good for scanning large portions of the table and/or a subset of the attributes. $\rightarrow$ Terrible memory locality in access patterns. $\rightarrow$ Not ideal for compression because of multiple value domains within a single page.
DECOMPOSITION STORAGE MODEL (DSM)
Store a single attribute for all tuples contiguously in a block of data. $\longrightarrow$ Also known as a "column store"
Ideal for OLAP workloads where read- only queries perform large scans over a subset of the table's attributes.
DBMS is responsible for combining/splitting a tuple's attributes when reading/writing.
A DECOMPOSITION STORAGE MODEL
George P Copeland Setzeg R Khoobelian Microelectronics and 3D Technology Computer Corporation 100 Research Blvd. Berlin, Texas 78759
Abstract
This report examines the relative advantage of a storage model based on decomposition community view relations into binary relations containing a surrogate and one attribute) over conventional s- ary storage models
There seems to be a general consensus among the database community that the n- ary approach is better. This conclusion is usually based on a consideration of only one or two dimensions of a database system. The purpose of this report is not to claim that decomposition is better instead, we found that the consensus opinion better until a closer analysis is made along the many questions of a database system. The purpose of this report is to move further in both scope and depth toward such an analysis. We examine such questions to update performance and retrieval performance
All items v11 | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All |
1
INTRODUCTION
Most database systems use an n- ary storage model (DSM) for a set of records. This approach stores data as seen in the conceptual schema. Also, various inverted file or cluster indexes might be added for improved access speeds. The key concept in the DSM is that all attributes of a conceptual schema record are stored together. For example, the conceptual schema relation
All items v11 | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All |
contains a surrogate for record identity and store attributes per record. The DSM would store si, v11, v21 and v31 together for each record 1
Permanent to copy without all or part of this manual is good and provided that the copies are not made or distributed for direct and commercial advantage. The ACM copyright notice and the terms by publication and its date appear, and notice is given that copyright permission of the Association for Computing Machinery To report otherwise, or to republish, requests a fee and/or specific permission
Some database systems use an n- ary transmission storage model, for example, an (lattice and Syncope 1971), TDD (Wiederhold et al 1975), RAPID (Turner et al 1979), Ademet (Durnett and Thomas 1965), Thiae (Shibayama et al 1963) and Thansha (1963). The approach stores all values of the same attribute of a conceptual scheme relation together. Several studies have compared the performance of transmission storage models with the DSM (Hoffer 1976, Batory 1979, March and Severance 1971, March and Scudder 1984). In this regard, we describe the advantages of a fully decomposed storage model (DSM), which is the transposed storage model with surrogate in a microend of the DDM. The advantage of this approach is that the binary relation for conceptual schema record in a would be stored as
All items v11 | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All |
In addition, the DSM stores two copies of each attribute relation. One copy is clustered on the storage model when attributes are clustered on the storage model. These attributes apply only to these attributes. These attributes apply only to the relational model, intermediate and final results needed in n- ary representation. If a richer data model than normalised relations is supported, then intermediate and final results need a correspondingly richer representation.
This report compiles these two storage models based on several criteria. Section 2 compares the relative complexity and generality of the two storage models. Section 3 compares their storage performance. Section 4 compares their update performance. Finally, Section 5 provides a summary and suggests some refinements for the DSM.
2 SIMPLICITY AND GENERALITY
This Section compares the two storage models to illustrate their relative simplicity and generality. Others (Abriain 1974, Deliyanni and Kowalski 1977, Kowalski 1978, Codd 1979) have argued for the semantic clarity and generality of representing such basic fact individually within the conceptual schema as the DSM does within the storage schema.
DSM: PHYSICAL ORGANIZATION
Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
$\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
Maintain separate pages per attribute with a dedicated header area for metadata about entire column.
DSM: PHYSICAL ORGANIZATION
Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
$\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
Maintain separate pages per attribute with a dedicated header area for metadata about entire column.
Table (html):
| #1 | header | null | bitmap |
| a0 | a1 | a2 | a3 | a4 | a5 |
DSM: PHYSICAL ORGANIZATION
Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
$\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
Maintain separate pages per attribute with a dedicated header area for metadata about entire column.
DSM: PHYSICAL ORGANIZATION
Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
$\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
Maintain separate pages per attribute with a dedicated header area for metadata about entire column.
DSM: OLAP EXAMPLE
Table (html):
| header | userID | userName | userPass | hostname | lastLogin |
| header | userID | userName | userPass | hostname | lastLogin |
| header | userID | userName | userPass | hostname | lastLogin |
| header | userID | userName | userPass | hostname | lastLogin |
DSM: OLAP EXAMPLE
Table (html):
| header | userID | userName | userPass | hostname | lastLogin |
| header | userID | userName | userPass | hostname | lastLogin |
| header | userID | userName | userPass | hostname | lastLogin |
| header | userID | userName | userPass | hostname | lastLogin |
DSM: OLAP EXAMPLE
DSM: OLAP EXAMPLE
DSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
DSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
DSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
DSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
DSM: OLAP EXAMPLE
DSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin) EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
DSM: OLAP EXAMPLE
SELECT COUNT(U.lastLogin) EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)
DSM: TUPLE IDENTIFICATION
Choice #1: Fixed-length Offsets
→ Each value is the same length for an attribute.
Choice #2: Embedded Tuple Ids
→ Each value is stored with its tuple id in a column.
Offsets
Embedded Ids
DSM: VARIABLE-LENGTH DATA
Padding variable- length fields to ensure they are fixed- length is wasteful, especially for large attributes.
A better approach is to use dictionary compression to convert repetitive variable- length data into fixed- length values (typically 32- bit integers). $\rightarrow$ More on this later in this lecture...
DECOMPOSITION STORAGE MODEL (DSM)
Advantages
$\rightarrow$ Reduces the amount wasted I/O per query because the DBMS only reads the data that it needs. $\rightarrow$ Faster query processing because of increased locality and cached data reuse (Lecture #13). $\rightarrow$ Better data compression.
Disadvantages
$\rightarrow$ Slow for point queries, inserts, updates, and deletes because of tuple splitting/stitching/reorganization.
OBSERVATION
OLAP queries almost never access a single column in a table by itself. $\rightarrow$ At some point during query execution, the DBMS must get other columns and stitch the original tuple back together. But we still need to store data in a columnar format to get the storage + execution benefits.
We need columnar scheme that still stores attributes separately but keeps the data for each tuple physically close to each other...
PAX STORAGE MODEL
Partition Attributes Across (PAX) is a hybrid storage model that vertically partitions attributes within a database page. $\longrightarrow$ Examples: Parquet, ORC, and Arrow.
The goal is to get the benefit of faster processing on columnar storage while retaining the spatial locality benefits of row storage.
Weaving Relations for Cache Performance
Mark D. Hill Maris Skoumakis Univ. of Wisconsin- Madison markh@
l@cs.wisc.edu
Abstract
Relational database systems have traditionally optimized for tremendous additional time to join the participating sub- tectiores.
1 Introduction
The communication between the CPU and the secondary storage (IO) has been traditionally recognized as the major database performance bottleneck. To optimize the transfer of data from mass storage, relational DBMs have long organized records in slotted disk pages using the N- ary Storage Model (NSM). NSM stores records originally starting from the beginning of each end of the page to locate the beginning of each record [12].
Unfortunately, most queries use only a Decomposition record. To minimize unnecessary I/O, the framework sub- relations, each an i- attribute relation vertically into n all parts of the record are on the same page. To reconstruct the corresponding attribute is needed. Queries that involve multiple attributes from a relation, however, must yield
1 Work done while author was at the University of Wisconsin- Madison. the record, PAX fully utilizes the cache resources because of each miss a number of a single attribute's values. upages of the records are on the same page. At the same time, all parts of the record are on the same page. To reconstruct the record one needs to perform a mini- join among minipages, which incurs minimal cost because it does not have to look beyond the page. We evaluated PAX against NSM and DSM using (a) predicate selection queries on numeric data and (b) a variety of queries on TPC- H datasets on top of the Shore storage manager [7]. We vary query parameters including selectivity, projectivity, number of predicates, distance between the projected attribute and the attribute in the predicate, and degree of the relation. The experimental results show that, when compared to NSM, PAX (a) incurs 50- 75% fewer second- level cache misses due to data
PAX: PHYSICAL ORGANIZATION
Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
Global meta- data directory contains offsets to the file's row groups.
$\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
Each row group contains its own meta- data header about its contents.
PAX: PHYSICAL ORGANIZATION
Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
Global meta- data directory contains offsets to the file's row groups.
$\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
Each row group contains its own meta- data header about its contents.
PAX: PHYSICAL ORGANIZATION
Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
Global meta- data directory contains offsets to the file's row groups.
$\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
Each row group contains its own meta- data header about its contents.
PAX: PHYSICAL ORGANIZATION
Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
Global meta- data directory contains offsets to the file's row groups.
$\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
Each row group contains its own meta- data header about its contents.
PAX: PHYSICAL ORGANIZATION
Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
Global meta- data directory contains offsets to the file's row groups.
$\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
Each row group contains its own meta- data header about its contents.
PAX: PHYSICAL ORGANIZATION
Horizontally partitio groups. Then vertical attributes into colum
Global meta- data dir offsets to the file's rc $\longrightarrow$ This is stored in the immutable (Parquet,
Parquet: data organization
Data organization Row- groups (default 128MB) Column chunks Pages (default 1MB)
Metadata
Min Max Count Rep/def levels Encoded values
Each row group con. databricks meta- data header about its contents.
File Meta- Data
OBSERVATION
I/O is the main bottleneck if the DBMS fetches data from disk during query execution.
The DBMS can compress pages to increase the utility of the data moved per I/O operation.
Key trade- off is speed vs. compression ratio $\rightarrow$ Compressing the database reduces DRAM requirements. $\rightarrow$ It may decrease CPU costs during query execution.
DATABASE COMPRESSION
Goal #1: Must produce fixed- length values. $\rightarrow$ Only exception is var- length data stored in separate pool.
Goal #2: Postpone decompression for as long as possible during query execution. $\rightarrow$ Also known as late materialization.
Goal #3: Must be a lossless scheme. $\rightarrow$ People (typically) don't like losing data. $\rightarrow$ Any lossy compression must be performed by application.
COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple-level
→ Compress the contents of the entire tuple (NSM- only).
Choice #3: Attribute-level
→ Compress a single attribute within one tuple (overflow). $\rightarrow$ Can target multiple attributes for the same tuple.
Choice #4: Column-level
→ Compress multiple values for one or more attributes stored for multiple tuples (DSM- only).
NAIVE COMPRESSION
Compress data using a general- purpose algorithm. Scope of compression is only based on the data provided as input.
→ LZO (1996), LZ4 (2011), Snappy (2011), Oracle OZIP (2014), Zstd (2015)
Considerations→ Computational overhead→ Compress vs. decompress speed.
MYSQL INNODB COMPRESSION
Buffer Pool
Database File
MYSQL INNODB COMPRESSION
Buffer Pool
Database File
[1,2,4,8] KB
MYSQL INNODB COMPRESSION
[1,2,4,8] KB
MYSQL INNODB COMPRESSION
MYSQL INNODB COMPRESSION
MYSQL INNODB COMPRESSION
MYSQL INNODB COMPRESSION
NAIVE COMPRESSION
The DBMS must decompress data first before it can be read and (potentially) modified. $\rightarrow$ This limits the "scope" of the compression scheme.
These schemes also do not consider the high- level meaning or semantics of the data.
OBSERVATION
Ideally, we want the DBMS to operate on compressed data without decompressing it first.
Table (html):
| NAME | SALARY |
| Andy | 99999 |
| Jignesh | 88888 |
OBSERVATION
Ideally, we want the DBMS to operate on compressed data without decompressing it first.
OBSERVATION
Ideally, we want the DBMS to operate on compressed data without decompressing it first.
SELECT * FROM users WHERE name = 'Andy'
OBSERVATION
Ideally, we want the DBMS to operate on compressed data without decompressing it first.
SELECT * FROM users WHERE name = 'Andy'
Database Magic!
SELECT * FROM users WHERE name = XX
Table (html):
| NAME | SALARY |
| Andy | 99999 |
| Jignesh | 88888 |
Table (html):
COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple- level
→ Compress the contents of the entire tuple (NSM- only).
Choice #3: Attribute- level
→ Compress a single attribute within one tuple (overflow). $\rightarrow$ Can target multiple attributes for the same tuple.
Choice #4: Column- level
→ Compress multiple values for one or more attributes stored for multiple tuples (DSM- only).
COLUMNAR COMPRESSION
Run- length Encoding Bit- Packing Encoding Bitmap Encoding Delta / Frame- of- Reference Encoding Incremental Encoding Dictionary Encoding
RUN-LENGTH ENCODING
Compress runs of the same value in a single column into triplets:
$\longrightarrow$ The value of the attribute. $\longrightarrow$ The start position in the column segment. $\longrightarrow$ The # of elements in the run.
Requires the columns to be sorted intelligently to maximize compression opportunities.
RUN-LENGTH ENCODING
Original Data
Table (html):
RUN-LENGTH ENCODING
Original Data
Table (html):
RUN-LENGTH ENCODING
Original Data
Compressed Data
RUN-LENGTH ENCODING
Compressed Data
SELECT isDead, COUNT(*) FROM users GROUP BY isDead
Table (html):
| id | iDead |
| 1 | (Y,0,3) |
| 2 | (N,3,1) |
| 3 | (Y,4,1) |
| 4 | (N,5,1) |
| 6 | (Y,6,2) |
| 7 | RLE Triplet |
| 8 | - Value |
| 9 | - Offset |
| - Length |
RUN-LENGTH ENCODING
Original Data
Compressed Data
RUN-LENGTH ENCODING
Original Data
Compressed Data
RUN-LENGTH ENCODING
Sorted Data
Compressed Data
Table (html):
RUN-LENGTH ENCODING
Sorted Data
Compressed Data
BIT PACKING
If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
Use bit- shifting tricks to operate on multiple values in a single word.
Original Data
Table (html):
| int32 |
| 13 |
| 191 |
| 56 |
| 92 |
| 81 |
| 120 |
| 231 |
| 172 |
BIT PACKING
If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
Use bit- shifting tricks to operate on multiple values in a single word.
Original Data
BIT PACKING
If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
Use bit- shifting tricks to operate on multiple values in a single word.
BIT PACKING
If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
Use bit- shifting tricks to operate on multiple values in a single word.
BIT PACKING
If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
Use bit- shifting tricks to operate on multiple values in a single word.
BIT PACKING
If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
Use bit- shifting tricks to operate on multiple values in a single word.
PATCHING / MOSTLY ENCODING
A variation of bit packing for when an attribute's values are "mostly" less than the largest size, store them with smaller data type.
$\rightarrow$ The remaining values that cannot be compressed are stored in their raw form.
Original Data
Table (html):
| int32 |
| 13 |
| 191 |
| 99999999 |
| 92 |
| 81 |
| 120 |
| 231 |
| 172 |
PATCHING / MOSTLY ENCODING
A variation of bit packing for when an attribute's values are "mostly" less than the largest size, store them with smaller data type.
$\longrightarrow$ The remaining values that cannot be compressed are stored in their raw form.
PATCHING / MOSTLY ENCODING
A variation of bit packing for when an attribute's values are "mostly" less than the largest size, store them with smaller data type.
$\longrightarrow$ The remaining values that cannot be compressed are stored in their raw form.
Original: 8 × 32- bits = 256 bits
Original Data
Table (html):
| int32 |
| 13 |
| 191 |
| 99999999 |
| 92 |
| 81 |
| 120 |
| 231 |
| 172 |
Compressed Data
Table (html):
| mostly8 | offset | value |
| 13 | 3 | 99999999 |
| 181 | | |
| XXX | | |
| 92 | | |
| 81 | | |
| 120 | | |
| 231 | | |
| 172 | | |
Compressed: (8 × 8- bits) + 16- bits + 32- bits = 112 bits
BITMAP ENCODING
Store a separate bitmap for each unique value for an attribute where an offset in the vector corresponds to a tuple.
$\rightarrow$ The $i^{th}$ position in the Bitmap corresponds to the $i^{th}$ tuple in the table. $\rightarrow$ Typically segmented into chunks to avoid allocating large blocks of contiguous memory.
Only practical if the value cardinality is low. Some DBMSs provide bitmap indexes.
BITMAP ENCODING
Original Data
Table (html):
BITMAP ENCODING
Original Data
Table (html):
BITMAP ENCODING
Original Data
Compressed Data
BITMAP ENCODING
Original Data
Compressed Data
BITMAP ENCODING
Original Data
Compressed Data
BITMAP ENCODING
Original Data
BITMAP ENCODING: EXAMPLE
Assume we have 10 million tuples. 43,000 zip codes in the US.
CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64), zip_code INT);
BITMAP ENCODING: EXAMPLE
Assume we have 10 million tuples. 43,000 zip codes in the US. → 10000000 × 32- bits = 40 MB
CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
zip_code INT
BITMAP ENCODING: EXAMPLE
Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow$ 10000000 × 32- bits = 40 MB $\rightarrow$ 10000000 × 43000 = 53.75 GB
CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
zip_code INT
BITMAP ENCODING: EXAMPLE
Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow 10000000 \times 32$ - bits = 40 MB $\rightarrow 10000000 \times 43000 = 53.75$ GB
Every time the application inserts a new tuple, the DBMS must extend 43,000 different bitmaps.
CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
zip_code INT
BITMAP ENCODING: EXAMPLE
Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow 10000000 \times 32$ - bits = 40 MB $\rightarrow 10000000 \times 43000 = 53.75$ GB
Every time the application inserts a new tuple, the DBMS must extend 43,000 different bitmaps.
There are compressed data structures for sparse data sets:
$\rightarrow$ Roaring Bitmaps
CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
zip_code INT
);
BITMAP ENCODING: EXAMPLE
Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow 10000000 \times 32$ - bits = 40 MB $\rightarrow 10000000 \times 43000 = 53.75$ GB
Every time the application inserts a new tuple, the DBMS must extend 43,000 different bitmaps.
CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
zip_code INT
);
There are compressed data structures for sparse data sets:
$\rightarrow$ Roaring Bitmaps
ClickHouse $\bullet$ Influxdb Weaviate
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table.
Original Data
Table (html):
| time64 | temp |
| 12:00 | 99.5 |
| 12:01 | 99.4 |
| 12:02 | 99.5 |
| 12:03 | 99.6 |
| 12:04 | 99.4 |
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table.
Table (html):
| time64 | temp |
| 12:00 | 99.5 |
| 12:01 | 99.4 |
| 12:02 | 99.5 |
| 12:03 | 99.6 |
| 12:04 | 99.4 |
[TableCaption: Original Data]
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table.
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow each other in the same column.
$\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios. Frame- of- Reference Variant: Use global min value.
DICTIONARY COMPRESSION
Replace frequent values with smaller fixed- length codes and then maintain a mapping (dictionary) from the codes to the original values $\rightarrow$ Typically, one code per attribute value. $\rightarrow$ Most widely used native compression scheme in DBMSs.
The ideal dictionary scheme supports fast encoding and decoding for both point and range queries. $\rightarrow$ Encode/Locate: For a given uncompressed value, convert it into its compressed form. $\rightarrow$ Decode/Extract: For a given compressed value, convert it back into its original form.
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same collation as the original values.
Original Data
Table (html):
| name |
| Andrea |
| Mr. Pickles |
| Andy |
| Jignesh |
| Mr. Pickles |
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same collation as the original values.
Original Data
Compressed Data
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same collation as the original values.
Original Data
Compressed Data
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same collation as the original values.
SELECT * FROM users WHERE name LIKE 'And%'
Original Data
Compressed Data
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same collation as the original values.
SELECT * FROM users WHERE name LIKE 'And%'
SELECT * FROM users WHERE name BETWEEN 10 AND 20
Original Data
Compressed Data
Dictionary Sorted
ORDER-PRESERVING ENCODING
SELECT name FROM users WHERE name LIKE 'And%'
Still must perform scan on column
SELECT DISTINCT name FROM users WHERE name LIKE 'And%'
Only need to access dictionary
Original Data
Compressed Data
CONCLUSION
It is important to choose the right storage model for the target workload: $\begin{array}{rl} & \rightarrow \mathrm{OLTP} = \mathrm{Row} \mathrm{Store}\ & \rightarrow \mathrm{OLAP} = \mathrm{Column} \mathrm{Store} \end{array}$
DBMSs can combine different approaches for even better compression.
Dictionary encoding is probably the most useful scheme because it does not require pre- sorting.