GithubDocuments
    1. Marketplace
    2. CMU Database - 15445 - 2025 Spring
    Collections
    CMU Database - 15445 - 2025 Spring
    cmu_database
    This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...
    DocumentsKnowledge Graph
    lecture-06-slides.pdf
    Carnegie Mellon University

    Database Systems

    Storage Models & Data Compression

    ADMINISTRIVIA

    Project #1 is due on February 9th @ 11
    Project recitation on Monday, February 3rd, from 5- 6pm in GHC 4303.
    Homework #2 is due February 9th @ 11

    LAST CLASS

    We discussed storage architecture alternatives to tuple- oriented scheme. $\rightarrow$ Log- structured storage $\rightarrow$ Index- organized storage
    These approaches are ideal for write- heavy (INSERT/UPDATE/DELETE) workloads.
    But the most important query for some workloads may be read (SELECT) performance...

    TODAY'S AGENDA

    Database WorkloadsStorage ModelsData Compression

    DATABASE WORKLOADS

    On-Line Transaction Processing (OLTP)

    $\rightarrow$ Fast operations that only read/update a small amount of data each time.

    On-Line Analytical Processing (OLAP)

    $\rightarrow$ Complex queries that read a lot of data to compute aggregates.
    Hybrid Transaction + Analytical Processing $\rightarrow$ OLTP + OLAP together on the same database instance

    DATABASE WORKLOADS

    Workload Focus

    WIKIPEDIA EXAMPLE

    OBSERVATION

    The relational model does not specify that the DBMS must store all a tuple's attributes together in a single page.
    This may not actually be the best layout for some workloads...

    OLTP

    On-line Transaction Processing:

    $\rightarrow$ Simple queries that read/update a small amount of data that is related to a single entity in the database.
    This is usually the kind of application that people build first.
    SELECT P., R. FROM pages AS P INNER JOIN revisions AS R ON PLatest = R.revID WHERE P.pageID = ?
    UPDATE useracct SET lastLogin = NOW(), hostname = ? WHERE userID = ?
    INSERT INTO revisions VALUES (?, ?, ?, ?)

    OLAP

    On-line Analytical Processing:

    $\rightarrow$ Complex queries that read large portions of the database spanning multiple entities.
    You execute these workloads on the data you have collected from your OLTP application(s).
    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    STORAGE MODELS

    A DBMS's storage model specifies how it physically organizes tuples on disk and in memory. $\rightarrow$ Can have different performance characteristics based on the target workload (OLTP vs. OLAP). $\rightarrow$ Influences the design choices of the rest of the DBMS.
    Choice #1: N- ary Storage Model (NSM) Choice #2: Decomposition Storage Model (DSM) Choice #3: Hybrid Storage Model (PAX)

    N-ARY STORAGE MODEL (NSM)

    The DBMS stores (almost) all attributes for a single tuple contiguously in a single page. $\rightarrow$ Also commonly known as a row store
    Ideal for OLTP workloads where queries are more likely to access individual entities and execute write- heavy workloads.
    NSM database page sizes are typically some constant multiple of 4 KB hardware pages.
    See Lecture #03

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.
    [ImageCaption: Slot Array]

    NSM: PHYSICAL ORGANIZATION

    A disk- oriented NSM system stores a tuple's fixed- length and variable- length attributes contiguously in a single slotted page.
    The tuple's record id (page#, slot#) is how the DBMS uniquely identifies a physical tuple.

    NSM: OLTP EXAMPLE

    SELECT * FROM useracct WHERE userName = ? AND userPass = ?

    NSM: OLTP EXAMPLE

    SELECT * FROM useracct WHERE userName = ? AND userPass = ?

    NSM: OLTP EXAMPLE

    SELECT * FROM useracct WHERE userName = ? AND userPass = ?

    NSM: OLTP EXAMPLE

    NSM: OLTP EXAMPLE

    NSM: OLTP EXAMPLE

    NSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    NSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    NSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    NSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    NSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    NSM: OLAP EXAMPLE

    NSM: OLAP EXAMPLE

    NSM: OLAP EXAMPLE

    NSM: OLAP EXAMPLE

    NSM: SUMMARY

    Advantages

    $\rightarrow$ Fast inserts, updates, and deletes. $\rightarrow$ Good for queries that need the entire tuple (OLTP). $\rightarrow$ Can use index- oriented physical storage for clustering.

    Disadvantages

    $\rightarrow$ Not good for scanning large portions of the table and/or a subset of the attributes. $\rightarrow$ Terrible memory locality in access patterns. $\rightarrow$ Not ideal for compression because of multiple value domains within a single page.

    DECOMPOSITION STORAGE MODEL (DSM)

    Store a single attribute for all tuples contiguously in a block of data. $\longrightarrow$ Also known as a "column store"
    Ideal for OLAP workloads where read- only queries perform large scans over a subset of the table's attributes.
    DBMS is responsible for combining/splitting a tuple's attributes when reading/writing.

    A DECOMPOSITION STORAGE MODEL

    George P Copeland Setzeg R Khoobelian Microelectronics and 3D Technology Computer Corporation 100 Research Blvd. Berlin, Texas 78759

    Abstract

    This report examines the relative advantage of a storage model based on decomposition community view relations into binary relations containing a surrogate and one attribute) over conventional s- ary storage models
    There seems to be a general consensus among the database community that the n- ary approach is better. This conclusion is usually based on a consideration of only one or two dimensions of a database system. The purpose of this report is not to claim that decomposition is better instead, we found that the consensus opinion better until a closer analysis is made along the many questions of a database system. The purpose of this report is to move further in both scope and depth toward such an analysis. We examine such questions to update performance and retrieval performance
    All items v11 | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All |

    1

    INTRODUCTION

    Most database systems use an n- ary storage model (DSM) for a set of records. This approach stores data as seen in the conceptual schema. Also, various inverted file or cluster indexes might be added for improved access speeds. The key concept in the DSM is that all attributes of a conceptual schema record are stored together. For example, the conceptual schema relation
    All items v11 | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All |
    contains a surrogate for record identity and store attributes per record. The DSM would store si, v11, v21 and v31 together for each record 1
    Permanent to copy without all or part of this manual is good and provided that the copies are not made or distributed for direct and commercial advantage. The ACM copyright notice and the terms by publication and its date appear, and notice is given that copyright permission of the Association for Computing Machinery To report otherwise, or to republish, requests a fee and/or specific permission
    Some database systems use an n- ary transmission storage model, for example, an (lattice and Syncope 1971), TDD (Wiederhold et al 1975), RAPID (Turner et al 1979), Ademet (Durnett and Thomas 1965), Thiae (Shibayama et al 1963) and Thansha (1963). The approach stores all values of the same attribute of a conceptual scheme relation together. Several studies have compared the performance of transmission storage models with the DSM (Hoffer 1976, Batory 1979, March and Severance 1971, March and Scudder 1984). In this regard, we describe the advantages of a fully decomposed storage model (DSM), which is the transposed storage model with surrogate in a microend of the DDM. The advantage of this approach is that the binary relation for conceptual schema record in a would be stored as
    All items v11 | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All | All |
    In addition, the DSM stores two copies of each attribute relation. One copy is clustered on the storage model when attributes are clustered on the storage model. These attributes apply only to these attributes. These attributes apply only to the relational model, intermediate and final results needed in n- ary representation. If a richer data model than normalised relations is supported, then intermediate and final results need a correspondingly richer representation.
    This report compiles these two storage models based on several criteria. Section 2 compares the relative complexity and generality of the two storage models. Section 3 compares their storage performance. Section 4 compares their update performance. Finally, Section 5 provides a summary and suggests some refinements for the DSM.

    2 SIMPLICITY AND GENERALITY

    This Section compares the two storage models to illustrate their relative simplicity and generality. Others (Abriain 1974, Deliyanni and Kowalski 1977, Kowalski 1978, Codd 1979) have argued for the semantic clarity and generality of representing such basic fact individually within the conceptual schema as the DSM does within the storage schema.

    DSM: PHYSICAL ORGANIZATION

    Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
    $\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
    Maintain separate pages per attribute with a dedicated header area for metadata about entire column.

    DSM: PHYSICAL ORGANIZATION

    Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
    $\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
    Maintain separate pages per attribute with a dedicated header area for metadata about entire column.
    Table (html):
    #1headernullbitmap
    a0a1a2a3a4a5

    DSM: PHYSICAL ORGANIZATION

    Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
    $\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
    Maintain separate pages per attribute with a dedicated header area for metadata about entire column.

    DSM: PHYSICAL ORGANIZATION

    Store attributes and meta- data (e.g., nulls) in separate arrays of fixed- length values.
    $\rightarrow$ Most systems identify unique physical tuples using offsets into these arrays. $\rightarrow$ Need to handle variable- length values...
    Maintain separate pages per attribute with a dedicated header area for metadata about entire column.

    DSM: OLAP EXAMPLE

    Table (html):
    headeruserIDuserNameuserPasshostnamelastLogin
    headeruserIDuserNameuserPasshostnamelastLogin
    headeruserIDuserNameuserPasshostnamelastLogin
    headeruserIDuserNameuserPasshostnamelastLogin

    DSM: OLAP EXAMPLE

    Table (html):
    headeruserIDuserNameuserPasshostnamelastLogin
    headeruserIDuserNameuserPasshostnamelastLogin
    headeruserIDuserNameuserPasshostnamelastLogin
    headeruserIDuserNameuserPasshostnamelastLogin

    DSM: OLAP EXAMPLE

    DSM: OLAP EXAMPLE

    DSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    DSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    DSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    DSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin), EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    DSM: OLAP EXAMPLE

    DSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin) EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    DSM: OLAP EXAMPLE

    SELECT COUNT(U.lastLogin) EXTRACT(month FROM U.lastLogin) AS month FROM useracct AS U WHERE U.hostname LIKE '%.gov' GROUP BY EXTRACT(month FROM U.lastLogin)

    DSM: TUPLE IDENTIFICATION

    Choice #1: Fixed-length Offsets

    → Each value is the same length for an attribute.

    Choice #2: Embedded Tuple Ids

    → Each value is stored with its tuple id in a column.

    Offsets

    Embedded Ids

    DSM: VARIABLE-LENGTH DATA

    Padding variable- length fields to ensure they are fixed- length is wasteful, especially for large attributes.
    A better approach is to use dictionary compression to convert repetitive variable- length data into fixed- length values (typically 32- bit integers). $\rightarrow$ More on this later in this lecture...

    DECOMPOSITION STORAGE MODEL (DSM)

    Advantages

    $\rightarrow$ Reduces the amount wasted I/O per query because the DBMS only reads the data that it needs. $\rightarrow$ Faster query processing because of increased locality and cached data reuse (Lecture #13). $\rightarrow$ Better data compression.

    Disadvantages

    $\rightarrow$ Slow for point queries, inserts, updates, and deletes because of tuple splitting/stitching/reorganization.

    OBSERVATION

    OLAP queries almost never access a single column in a table by itself. $\rightarrow$ At some point during query execution, the DBMS must get other columns and stitch the original tuple back together. But we still need to store data in a columnar format to get the storage + execution benefits.
    We need columnar scheme that still stores attributes separately but keeps the data for each tuple physically close to each other...

    PAX STORAGE MODEL

    Partition Attributes Across (PAX) is a hybrid storage model that vertically partitions attributes within a database page. $\longrightarrow$ Examples: Parquet, ORC, and Arrow.
    The goal is to get the benefit of faster processing on columnar storage while retaining the spatial locality benefits of row storage.

    Weaving Relations for Cache Performance

    Anastassia Allamaki David J. DeWitt Carnegie Mellon University Univ. of Wisconsin- Madison nateso@cs.cmu.edu alewir@cs.wisc.edu
    Mark D. Hill Maris Skoumakis Univ. of Wisconsin- Madison markh@l@cs.wisc.edu

    Abstract

    Relational database systems have traditionally optimized for tremendous additional time to join the participating sub- tectiores.

    1 Introduction

    The communication between the CPU and the secondary storage (IO) has been traditionally recognized as the major database performance bottleneck. To optimize the transfer of data from mass storage, relational DBMs have long organized records in slotted disk pages using the N- ary Storage Model (NSM). NSM stores records originally starting from the beginning of each end of the page to locate the beginning of each record [12].
    Unfortunately, most queries use only a Decomposition record. To minimize unnecessary I/O, the framework sub- relations, each an i- attribute relation vertically into n all parts of the record are on the same page. To reconstruct the corresponding attribute is needed. Queries that involve multiple attributes from a relation, however, must yield
    1 Work done while author was at the University of Wisconsin- Madison. the record, PAX fully utilizes the cache resources because of each miss a number of a single attribute's values. upages of the records are on the same page. At the same time, all parts of the record are on the same page. To reconstruct the record one needs to perform a mini- join among minipages, which incurs minimal cost because it does not have to look beyond the page. We evaluated PAX against NSM and DSM using (a) predicate selection queries on numeric data and (b) a variety of queries on TPC- H datasets on top of the Shore storage manager [7]. We vary query parameters including selectivity, projectivity, number of predicates, distance between the projected attribute and the attribute in the predicate, and degree of the relation. The experimental results show that, when compared to NSM, PAX (a) incurs 50- 75% fewer second- level cache misses due to data

    PAX: PHYSICAL ORGANIZATION

    Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
    Global meta- data directory contains offsets to the file's row groups.
    $\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
    Each row group contains its own meta- data header about its contents.

    PAX: PHYSICAL ORGANIZATION

    Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
    Global meta- data directory contains offsets to the file's row groups.
    $\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
    Each row group contains its own meta- data header about its contents.

    PAX: PHYSICAL ORGANIZATION

    Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
    Global meta- data directory contains offsets to the file's row groups.
    $\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
    Each row group contains its own meta- data header about its contents.

    PAX: PHYSICAL ORGANIZATION

    Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
    Global meta- data directory contains offsets to the file's row groups.
    $\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
    Each row group contains its own meta- data header about its contents.

    PAX: PHYSICAL ORGANIZATION

    Horizontally partition data into row groups. Then vertically partition their attributes into column chunks.
    Global meta- data directory contains offsets to the file's row groups.
    $\longrightarrow$ This is stored in the footer if the file is immutable (Parquet, Orc).
    Each row group contains its own meta- data header about its contents.

    PAX: PHYSICAL ORGANIZATION

    Horizontally partitio groups. Then vertical attributes into colum
    Global meta- data dir offsets to the file's rc $\longrightarrow$ This is stored in the immutable (Parquet,

    Parquet: data organization

    Data organization Row- groups (default 128MB) Column chunks Pages (default 1MB)
    Metadata
    Min Max Count Rep/def levels Encoded values
    Each row group con. databricks meta- data header about its contents.
    File Meta- Data

    OBSERVATION

    I/O is the main bottleneck if the DBMS fetches data from disk during query execution.
    The DBMS can compress pages to increase the utility of the data moved per I/O operation.
    Key trade- off is speed vs. compression ratio $\rightarrow$ Compressing the database reduces DRAM requirements. $\rightarrow$ It may decrease CPU costs during query execution.

    DATABASE COMPRESSION

    Goal #1: Must produce fixed- length values. $\rightarrow$ Only exception is var- length data stored in separate pool.
    Goal #2: Postpone decompression for as long as possible during query execution. $\rightarrow$ Also known as late materialization.
    Goal #3: Must be a lossless scheme. $\rightarrow$ People (typically) don't like losing data. $\rightarrow$ Any lossy compression must be performed by application.

    COMPRESSION GRANULARITY

    Choice #1: Block-level

    → Compress a block of tuples for the same table.

    Choice #2: Tuple-level

    → Compress the contents of the entire tuple (NSM- only).

    Choice #3: Attribute-level

    → Compress a single attribute within one tuple (overflow). $\rightarrow$ Can target multiple attributes for the same tuple.

    Choice #4: Column-level

    → Compress multiple values for one or more attributes stored for multiple tuples (DSM- only).

    NAIVE COMPRESSION

    Compress data using a general- purpose algorithm. Scope of compression is only based on the data provided as input.
    → LZO (1996), LZ4 (2011), Snappy (2011), Oracle OZIP (2014), Zstd (2015)
    Considerations→ Computational overhead→ Compress vs. decompress speed.

    MYSQL INNODB COMPRESSION

    Buffer Pool

    Database File

    MYSQL INNODB COMPRESSION

    Buffer Pool

    Database File

    [1,2,4,8] KB

    MYSQL INNODB COMPRESSION

    [1,2,4,8] KB

    MYSQL INNODB COMPRESSION

    MYSQL INNODB COMPRESSION

    MYSQL INNODB COMPRESSION

    MYSQL INNODB COMPRESSION

    NAIVE COMPRESSION

    The DBMS must decompress data first before it can be read and (potentially) modified. $\rightarrow$ This limits the "scope" of the compression scheme.
    These schemes also do not consider the high- level meaning or semantics of the data.

    OBSERVATION

    Ideally, we want the DBMS to operate on compressed data without decompressing it first.
    Table (html):
    NAMESALARY
    Andy99999
    Jignesh88888

    OBSERVATION

    Ideally, we want the DBMS to operate on compressed data without decompressing it first.

    OBSERVATION

    Ideally, we want the DBMS to operate on compressed data without decompressing it first.
    SELECT * FROM users WHERE name = 'Andy'

    OBSERVATION

    Ideally, we want the DBMS to operate on compressed data without decompressing it first.
    SELECT * FROM users WHERE name = 'Andy'

    Database Magic!

    SELECT * FROM users WHERE name = XX
    Table (html):
    NAMESALARY
    Andy99999
    Jignesh88888
    Table (html):
    NAMESALARY
    XXAA
    YYBB

    COMPRESSION GRANULARITY

    Choice #1: Block-level

    → Compress a block of tuples for the same table.
    Choice #2: Tuple- level
    → Compress the contents of the entire tuple (NSM- only).
    Choice #3: Attribute- level
    → Compress a single attribute within one tuple (overflow). $\rightarrow$ Can target multiple attributes for the same tuple.
    Choice #4: Column- level
    → Compress multiple values for one or more attributes stored for multiple tuples (DSM- only).

    COLUMNAR COMPRESSION

    Run- length Encoding Bit- Packing Encoding Bitmap Encoding Delta / Frame- of- Reference Encoding Incremental Encoding Dictionary Encoding

    RUN-LENGTH ENCODING

    Compress runs of the same value in a single column into triplets:
    $\longrightarrow$ The value of the attribute. $\longrightarrow$ The start position in the column segment. $\longrightarrow$ The # of elements in the run.
    Requires the columns to be sorted intelligently to maximize compression opportunities.

    RUN-LENGTH ENCODING

    Original Data

    Table (html):
    idisDead
    1Y
    2Y
    3Y
    4N
    6Y
    7N
    8Y
    9Y

    RUN-LENGTH ENCODING

    Original Data

    Table (html):
    idisDead
    1Y
    2Y
    3Y
    4N
    6Y
    7N
    8Y
    9Y

    RUN-LENGTH ENCODING

    Original Data

    Compressed Data

    RUN-LENGTH ENCODING

    Compressed Data

    SELECT isDead, COUNT(*) FROM users GROUP BY isDead
    Table (html):
    idiDead
    1(Y,0,3)
    2(N,3,1)
    3(Y,4,1)
    4(N,5,1)
    6(Y,6,2)
    7RLE Triplet
    8- Value
    9- Offset
    - Length

    RUN-LENGTH ENCODING

    Original Data

    Compressed Data

    RUN-LENGTH ENCODING

    Original Data

    Compressed Data

    RUN-LENGTH ENCODING

    Sorted Data

    Compressed Data

    Table (html):
    idisDead
    1Y
    2Y
    3Y
    6Y
    8Y
    9Y
    4N
    7N

    RUN-LENGTH ENCODING

    Sorted Data

    Compressed Data

    BIT PACKING

    If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
    Use bit- shifting tricks to operate on multiple values in a single word.

    Original Data

    Table (html):
    int32
    13
    191
    56
    92
    81
    120
    231
    172

    BIT PACKING

    If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
    Use bit- shifting tricks to operate on multiple values in a single word.

    Original Data

    BIT PACKING

    If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
    Use bit- shifting tricks to operate on multiple values in a single word.

    BIT PACKING

    If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
    Use bit- shifting tricks to operate on multiple values in a single word.

    BIT PACKING

    If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
    Use bit- shifting tricks to operate on multiple values in a single word.

    BIT PACKING

    If the values for an integer attribute is smaller than the range of its given data type size, then reduce the number of bits to represent each value.
    Use bit- shifting tricks to operate on multiple values in a single word.

    PATCHING / MOSTLY ENCODING

    A variation of bit packing for when an attribute's values are "mostly" less than the largest size, store them with smaller data type.
    $\rightarrow$ The remaining values that cannot be compressed are stored in their raw form.

    Original Data

    Table (html):
    int32
    13
    191
    99999999
    92
    81
    120
    231
    172

    PATCHING / MOSTLY ENCODING

    A variation of bit packing for when an attribute's values are "mostly" less than the largest size, store them with smaller data type.
    $\longrightarrow$ The remaining values that cannot be compressed are stored in their raw form.

    PATCHING / MOSTLY ENCODING

    A variation of bit packing for when an attribute's values are "mostly" less than the largest size, store them with smaller data type.
    $\longrightarrow$ The remaining values that cannot be compressed are stored in their raw form.
    Original: 8 × 32- bits = 256 bits

    Original Data

    Table (html):
    int32
    13
    191
    99999999
    92
    81
    120
    231
    172

    Compressed Data

    Table (html):
    mostly8offsetvalue
    13399999999
    181
    XXX
    92
    81
    120
    231
    172
    Compressed: (8 × 8- bits) + 16- bits + 32- bits = 112 bits

    BITMAP ENCODING

    Store a separate bitmap for each unique value for an attribute where an offset in the vector corresponds to a tuple.
    $\rightarrow$ The $i^{th}$ position in the Bitmap corresponds to the $i^{th}$ tuple in the table. $\rightarrow$ Typically segmented into chunks to avoid allocating large blocks of contiguous memory.
    Only practical if the value cardinality is low. Some DBMSs provide bitmap indexes.

    BITMAP ENCODING

    Original Data

    Table (html):
    idisDead
    1Y
    2Y
    3Y
    4N
    6Y
    7N
    8Y
    9Y

    BITMAP ENCODING

    Original Data

    Table (html):
    idisDead
    1Y
    2Y
    3Y
    4N
    6Y
    7N
    8Y
    9Y

    BITMAP ENCODING

    Original Data

    Compressed Data

    BITMAP ENCODING

    Original Data

    Compressed Data

    BITMAP ENCODING

    Original Data

    Compressed Data

    BITMAP ENCODING

    Original Data

    BITMAP ENCODING: EXAMPLE

    Assume we have 10 million tuples. 43,000 zip codes in the US.
    CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64), zip_code INT);

    BITMAP ENCODING: EXAMPLE

    Assume we have 10 million tuples. 43,000 zip codes in the US. → 10000000 × 32- bits = 40 MB
    CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
    zip_code INT

    BITMAP ENCODING: EXAMPLE

    Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow$ 10000000 × 32- bits = 40 MB $\rightarrow$ 10000000 × 43000 = 53.75 GB
    CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
    zip_code INT

    BITMAP ENCODING: EXAMPLE

    Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow 10000000 \times 32$ - bits = 40 MB $\rightarrow 10000000 \times 43000 = 53.75$ GB
    Every time the application inserts a new tuple, the DBMS must extend 43,000 different bitmaps.
    CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
    zip_code INT

    BITMAP ENCODING: EXAMPLE

    Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow 10000000 \times 32$ - bits = 40 MB $\rightarrow 10000000 \times 43000 = 53.75$ GB
    Every time the application inserts a new tuple, the DBMS must extend 43,000 different bitmaps.
    There are compressed data structures for sparse data sets:
    $\rightarrow$ Roaring Bitmaps
    CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
    zip_code INT
    );

    BITMAP ENCODING: EXAMPLE

    Assume we have 10 million tuples. 43,000 zip codes in the US. $\rightarrow 10000000 \times 32$ - bits = 40 MB $\rightarrow 10000000 \times 43000 = 53.75$ GB
    Every time the application inserts a new tuple, the DBMS must extend 43,000 different bitmaps.
    CREATE TABLE customer ( id INT PRIMARY KEY, name VARCHAR(32), email VARCHAR(64), address VARCHAR(64),
    zip_code INT
    );
    There are compressed data structures for sparse data sets:
    $\rightarrow$ Roaring Bitmaps
    ClickHouse $\bullet$ Influxdb Weaviate

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table.

    Original Data

    Table (html):
    time64temp
    12:0099.5
    12:0199.4
    12:0299.5
    12:0399.6
    12:0499.4

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table.
    Table (html):
    time64temp
    12:0099.5
    12:0199.4
    12:0299.5
    12:0399.6
    12:0499.4
    [TableCaption: Original Data]

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table.

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column. $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios.

    DELTA ENCODING

    Recording the difference between values that follow each other in the same column.
    $\longrightarrow$ Store base value in- line or in a separate look- up table. $\longrightarrow$ Combine with RLE to get even better compression ratios. Frame- of- Reference Variant: Use global min value.

    DICTIONARY COMPRESSION

    Replace frequent values with smaller fixed- length codes and then maintain a mapping (dictionary) from the codes to the original values $\rightarrow$ Typically, one code per attribute value. $\rightarrow$ Most widely used native compression scheme in DBMSs.
    The ideal dictionary scheme supports fast encoding and decoding for both point and range queries. $\rightarrow$ Encode/Locate: For a given uncompressed value, convert it into its compressed form. $\rightarrow$ Decode/Extract: For a given compressed value, convert it back into its original form.

    DICTIONARY: ORDER-PRESERVING

    The encoded values need to support the same collation as the original values.

    Original Data

    Table (html):
    name
    Andrea
    Mr. Pickles
    Andy
    Jignesh
    Mr. Pickles

    DICTIONARY: ORDER-PRESERVING

    The encoded values need to support the same collation as the original values.

    Original Data

    Compressed Data

    DICTIONARY: ORDER-PRESERVING

    The encoded values need to support the same collation as the original values.

    Original Data

    Compressed Data

    DICTIONARY: ORDER-PRESERVING

    The encoded values need to support the same collation as the original values.
    SELECT * FROM users WHERE name LIKE 'And%'

    Original Data

    Compressed Data

    DICTIONARY: ORDER-PRESERVING

    The encoded values need to support the same collation as the original values.
    SELECT * FROM users WHERE name LIKE 'And%'
    SELECT * FROM users WHERE name BETWEEN 10 AND 20

    Original Data

    Compressed Data

    Dictionary Sorted

    ORDER-PRESERVING ENCODING

    SELECT name FROM users WHERE name LIKE 'And%'
    Still must perform scan on column
    SELECT DISTINCT name FROM users WHERE name LIKE 'And%'
    Only need to access dictionary

    Original Data

    Compressed Data

    CONCLUSION

    It is important to choose the right storage model for the target workload: $\begin{array}{rl} & \rightarrow \mathrm{OLTP} = \mathrm{Row} \mathrm{Store}\ & \rightarrow \mathrm{OLAP} = \mathrm{Column} \mathrm{Store} \end{array}$
    DBMSs can combine different approaches for even better compression.
    Dictionary encoding is probably the most useful scheme because it does not require pre- sorting.