GithubDocuments
    1. Marketplace
    2. CMU Database - 15445 - 2025 Spring
    Collections
    CMU Database - 15445 - 2025 Spring
    cmu_database
    This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...
    DocumentsKnowledge Graph
    lecture-19-slides.pdf
    Carnegie Mellon University
    Database Systems
    Multi- Version
    Concurrency Control

    ADMINISTRIVIA

    Project #4 is out; due Sunday April 20th @ 11
    $\rightarrow$ Recitation: Friday, April 11th in GHC 4303 from 3
    - 4
    PM
    Final Exam is on Monday, April 28, 2025, from 05
    - 08
    $\rightarrow$ Early exam will not be offered. Do not make travel plans.
    No OH on April 7th (I have to teach another class)

    UPCOMING DATABASE TALKS

    StarRocks (DB Seminar)

    $\longrightarrow$ Monday, March 31 @ 4
    $\longrightarrow$ StarRocks Query Optimizer $\longrightarrow$ Speaker: Kaisen Kang $\longrightarrow$ https://cmu.zoom.us/j/93441451665

    StarRocks

    Oracle (DB Seminar)
    $\longrightarrow$ Tuesday, April 1 @ noon $\longrightarrow$ Evolving Transactions in Oracle 23ai: New Advances in Concurrency and Consistency $\longrightarrow$ Speakers: Kishy Kumar and Akshay S. Kulkarni $\longrightarrow$ https://cmu.zoom.us/my/jignesh
    ORACLE

    LAST CLASS

    We discussed concurrency control protocols for generating conflict serializable schedules without needing to know what queries a txn will execute.
    The two- phase locking (2PL) protocol requires txns to acquire locks on database objects before they are allowed to access them.

    MULTI-VERSION CONCURRENCY CONTROL

    The DBMS maintains multiple physical versions of a single logical object in the database:
    $\longrightarrow$ When a txn writes to an object, the DBMS creates a new version of that object. $\longrightarrow$ When a txn reads an object, it reads the newest version that existed when the txn started.

    MVCC HISTORY

    Protocol was first proposed in 1978 MIT PhD dissertation.
    First implementations was Rdb/VMS and InterBase at DEC in early 1980s. $\longrightarrow$ Both were by Jim Starkey, co- founder of Nuodb. $\longrightarrow$ DEC Rdb/VMS is now "Oracle Rdb". $\longrightarrow$ InterBase was open- sourced as Firebird.
    Rdb/VMS

    MVCC HISTORY

    Protocol was first proposed in 1978 MIT PhD dissertation.
    First implementations was Rdb/VMS and InterBase at DEC in early 1980s. $\longrightarrow$ Both were by Jim Starkey, co- founder of. NuoDB. $\longrightarrow$ DEC Rdb/VMS is now "Oracle Rdb". $\longrightarrow$ InterBase was open- sourced as Firebird.

    MULTI-VERSION CONCURRENCY CONTROL

    Writers do not block readers. Readers do not block writers.
    Read- only txns can read a consistent snapshot without acquiring locks.
    $\longrightarrow$ Use timestamps to determine visibility. $\longrightarrow$ MVCC naturally supports Snapshot Isolation (SI).
    Multi- versioning without garbage collection allows the DBMS to support time- travel queries.

    MVCC - EXAMPLE #1

    Schedule

    Database

    MVCC - EXAMPLE #1

    MVCC - EXAMPLE #1

    Schedule

    Database

    MVCC - EXAMPLE #1

    Schedule

    Database

    MVCC - EXAMPLE #1

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    MVCC - EXAMPLE #1

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    MVCC - EXAMPLE #1

    Schedule

    Database

    MVCC - EXAMPLE #1

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    MVCC - EXAMPLE #1

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    MVCC - EXAMPLE #1

    Schedule

    Database

    MVCC - EXAMPLE #1

    Schedule

    Database

    MVCC - EXAMPLE #1

    Schedule

    Database

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active
    T22Active

    MVCC - EXAMPLE #1

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A002123
    A12-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active
    T22Active

    MVCC - EXAMPLE #1

    MVCC - EXAMPLE #1

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A002123
    A12-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active
    T22Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A00-123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A001123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A001123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active

    MVCC - EXAMPLE #2

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A001123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active
    T22Active

    MVCC - EXAMPLE #2

    Schedule

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A001123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active
    T22Active

    MVCC - EXAMPLE #2

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A001123
    A11-456

    Txn Status Table

    Table (html):
    txnidtimestampstatus
    T11Active
    T22Active

    MVCC - EXAMPLE #2

    Schedule

    Database

    Table (html):
    begin-tsend-tsvalue
    A001123
    A11-456

    Txn Status Table

    Table (html):
    txnid timestampstatus
    T11Committed
    T22Active

    MVCC - EXAMPLE #2

    SNAPSHOT ISOLATION (SI)

    When a txn starts, it sees a consistent snapshot of the database that existed when that the txn started. $\rightarrow$ No torn writes from active txns. $\rightarrow$ If two txns update the same object, then first writer wins.
    SI is susceptible to the Write Skew Anomaly.

    WRITE SKEW ANOMALY

    WRITE SKEW ANOMALY

    Txn #1 Change all white marbles to black.
    Txn #2 Change all black marbles to white.

    WRITE SKEW ANOMALY

    WRITE SKEW ANOMALY

    WRITE SKEW ANOMALY

    WRITE SKEW ANOMALY

    WRITE SKEW ANOMALY

    WRITE SKEW ANOMALY

    MULTI-VERSION CONCURRENCY CONTROL

    MVCC is more than just a concurrency control protocol. It completely affects how the DBMS manages transactions and the database.

    MULTI-VERSION CONCURRENCY CONTROL

    MVCC is more than just a concurrency control protocol. It completely affects how the DBMS manages transactions and the database.

    MVCC DESIGN DECISIONS

    Concurrence Control ProtocolVersion StorageGarbage CollectionIndex ManagementDeletes

    CONCURRENCY CONTROL PROTOCOL

    Approach #1: Timestamp Ordering

    → Assign txns timestamps that determine serial order.

    Approach #2: Optimistic Concurrency Control

    → Three- phase protocol from last class. → Use private workspace for new versions.

    Approach #3: Two-Phase Locking

    → TXns acquire appropriate lock on physical version before they can read/write a logical tuple.

    VERSION STORAGE

    The DBMS uses the tuples' pointer field to create a version chain per logical tuple.
    → This allows the DBMS to find the version that is visible to a particular txn at runtime. → Indexes always point to the "head" of the chain.
    Different storage schemes determine where/what to store for each version.

    VERSION STORAGE

    Approach #1: Append- Only Storage $\rightarrow$ New versions are appended to the same table space.
    Approach #2: Time- Travel Storage $\rightarrow$ Old versions are copied to separate table space.
    Approach #3: Delta Storage $\rightarrow$ The original values of the modified attributes are copied into a separate delta record space.

    VERSION STORAGE

    Approach #1: Append-Only Storage

    → New versions are appended to the same table space.
    Do This!

    Approach #2: Time-Travel Storage

    → Old versions are copied to separate table space.

    Approach #3: Delta Storage

    → The original values of the modified attributes are copied into a separate delta record space.

    APPEND-ONLY STORAGE

    All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.
    On every update, append a new version of the tuple into an empty space in the table.

    Main Table

    Table (html):
    valuepointer
    A0$111
    A1$2220
    B1$100

    APPEND-ONLY STORAGE

    All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.
    On every update, append a new version of the tuple into an empty space in the table.

    Main Table

    APPEND-ONLY STORAGE

    All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.
    On every update, append a new version of the tuple into an empty space in the table.

    Main Table

    APPEND-ONLY STORAGE

    All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.
    On every update, append a new version of the tuple into an empty space in the table.

    Main Table

    Table (html):
    valuepointer
    A0$111
    A1$2220
    B1$100
    A2$3330

    APPEND-ONLY STORAGE

    All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.
    On every update, append a new version of the tuple into an empty space in the table.

    Main Table

    APPEND-ONLY STORAGE

    All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.
    On every update, append a new version of the tuple into an empty space in the table.

    Main Table

    VERSION CHAIN ORDERING

    Approach #1: Oldest-to-Newest (O2N)

    $\rightarrow$ Append new version to end of the chain. $\rightarrow$ Must traverse chain on look- ups.

    Approach #2: Newest-to-Oldest (N2O)

    $\rightarrow$ Must update index pointers for every new version. $\rightarrow$ Do not have to traverse chain on look- ups. $\rightarrow$ This is typically the better approach because most txns only care about the newest version.

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    Table (html):
    valuepointer
    A2$222
    B1$10

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    On every update, copy the current version to the time- travel table. Update pointers.

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    On every update, copy the current version to the time- travel table. Update pointers.

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    On every update, copy the current version to the time- travel table. Update pointers.

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    TIME-TRAVEL STORAGE

    Main Table

    Time-Travel Table

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    Table (html):
    valuepointer
    A1$111
    B1$10

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    Table (html):
    valuepointer
    A1$111
    B1$10
    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    DELTA STORAGE

    Main Table

    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    Delta Storage Segment

    Table (html):
    deltapointer
    A1(VALUE→$111)0

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

    DELTA STORAGE

    Main Table

    Delta Storage Segment

    GARBAGE COLLECTION

    The DBMS needs to remove reclaimable physical versions from the database over time. $\rightarrow$ No active txn in the DBMS can "see" that version (SI). $\rightarrow$ The version was created by an aborted txn.
    Two additional design decisions: $\rightarrow$ How to look for expired versions? $\rightarrow$ How to decide when it is safe to reclaim memory?

    GARBAGE COLLECTION

    The DBMS needs to remove reclaimable physical versions from the database over time. $\rightarrow$ No active txn in the DBMS can "see" that version (SI). $\rightarrow$ The version was created by an aborted txn.
    Two additional design decisions: $\rightarrow$ How to look for expired versions? $\rightarrow$ How to decide when it is safe to reclaim memory?

    GARBAGE COLLECTION

    Approach #1: Tuple-level

    → Find old versions by examining tuples directly. → Background Vacuuming vs. Cooperative Cleaning

    Approach #2: Transaction-level

    → Txns keep track of their old versions so the DBMS does not have to scan tuples to determine visibility.

    TUPLE-LEVEL GC

    Txn #1
    Tid=12
    Txn #2
    Tid=25
    Table (html):
    begin-tsend-ts
    A10019
    B10019
    B1011020

    TUPLE-LEVEL GC

    Txn #1

    Tid=12
    Txn #2
    Tid=25
    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Table (html):
    begin-tsend-ts
    A10019
    B10019
    B1011020

    TUPLE-LEVEL GC

    Txn #1
    Tid=12
    Txn #2
    Tid=25
    Vacuum
    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Table (html):
    begin-tsend-ts
    A10019
    B10019
    B1011020

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Table (html):
    begin-tsend-ts
    A10019
    B10019
    B1011020

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Table (html):
    Dirty Block BitMapbegin-tsend-ts
    B1011020

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Table (html):
    begin-tsend-ts
    B1011020

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Table (html):
    begin-tsend-ts
    B1011020

    TUPLE-LEVEL GC

    Txn #1

    Tid=12
    Txn #2
    Tid=25
    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

    TUPLE-LEVEL GC

    Txn #1
    Tid=12
    Txn #2
    Tid=25
    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Txn #1
    Tid=12
    Txn #2
    Tid=25
    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TUPLE-LEVEL GC

    Txn #1
    Tid=12
    Txn #2
    Tid=25
    Background Vacuuming:
    Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.
    Cooperative Cleaning:
    Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

    TRANSACTION-LEVEL GC

    Each txn keeps track of its read/write set. On commit/abort, the txn provides this information to a centralized vacuum worker.
    The DBMS periodically determines when all versions created by a finished txn are no longer visible.

    TRANSACTION-LEVEL GC

    Txn #1

    BEGIN @ 10
    Table (html):
    begin-tsend-tsvalue
    A21∞-
    B68∞-

    TRANSACTION-LEVEL GC

    Txn #1
    BEGIN @ 10
    Table (html):
    begin-tsend-tsvalue
    A21∞-
    B68∞-

    TRANSACTION-LEVEL GC

    Txn #1
    BEGIN @ 10
    Table (html):
    begin-tsend-tsvalue
    A21∞-
    B68∞-

    TRANSACTION-LEVEL GC

    Txn #1
    BEGIN @ 10
    Table (html):
    begin-tsend-tsvalue
    A2110-
    B68∞-
    A310∞-

    TRANSACTION-LEVEL GC

    Txn #1
    BEGIN @ 10
    Old Versions
    A2
    Table (html):
    begin-tsend-tsvalue
    A2110-
    B68∞-
    A310∞-

    TRANSACTION-LEVEL GC

    TRANSACTION-LEVEL GC

    Txn #1
    BEGIN @ 10
    Old Versions
    A2
    Table (html):
    begin-tsend-tsvalue
    A2110-
    B68∞-
    A310∞-

    TRANSACTION-LEVEL GC

    Txn #1
    BEGIN @ 10
    Old Versions
    A2
    Table (html):
    begin-tsend-tsvalue
    A2110-
    B6810-
    A310∞-
    B710∞-

    TRANSACTION-LEVEL GC

    TRANSACTION-LEVEL GC

    Txn #1

    BEGIN @ 10 COMMIT @ 15
    Old Versions
    Table (html):
    begin-tsend-tsvalue
    A2110-
    B6810-
    A310∞-
    B710∞-

    TRANSACTION-LEVEL GC

    Txn #1

    BEGIN @ 10 COMMIT @ 15
    Old Versions
    [ImageCaption: Vacuum]
    Table (html):
    begin-tsend-tsvalue
    A2110-
    B6810-
    A310∞-
    B710∞-

    INDEX MANAGEMENT

    Primary key indexes point to version chain head. $\rightarrow$ How often the DBMS must update the pkey index depends on whether the system creates new versions when a tuple is updated. $\rightarrow$ If a txn updates a tuple's pkey attribute(s), then this is treated as a DELETE followed by an INSERT.
    Secondary indexes are more complicated...
    Primary key in → How often th on whether ti is updated. → If a txn updat treated as a D
    Secondary in

    ARCHITECTURE WHY UBER ENGINEERING SWITCHED FROM POSTGRES TO MYSQL

    JULY 26, 2016 BY EVAN KLITZKE

    SECONDARY INDEXES

    Approach #1: Logical Pointers

    → Use a fixed identifier per tuple that does not change. $\rightarrow$ Requires an extra indirection layer. $\rightarrow$ Primary Key vs. Tuple Id

    Approach #2: Physical Pointers

    → Use the physical address to the version chain head.

    INDEX POINTERS: APPEND-ONLY

    PRIMARY INDEX

    SECONDARY INDEX

    INDEX POINTERS: APPEND-ONLY

    PRIMARY INDEX

    SECONDARY INDEX

    INDEX POINTERS: APPEND-ONLY

    GET(A)

    PRIMARY INDEX
    SECONDARY INDEX

    INDEX POINTERS: APPEND-ONLY

    INDEX POINTERS: APPEND-ONLY

    PRIMARY INDEX

    SECONDARY INDEX

    INDEX POINTERS: APPEND-ONLY

    GET(A)
    PRIMARY INDEX
    SECONDARY INDEX

    INDEX POINTERS: APPEND-ONLY

    INDEX POINTERS: APPEND-ONLY

    INDEX POINTERS: APPEND-ONLY

    INDEX POINTERS: APPEND-ONLY

    GET(A)

    PRIMARY INDEX

    SECONDARY INDEX

    INDEX POINTERS: APPEND-ONLY

    INDEX POINTERS: APPEND-ONLY

    GET(A)
    PRIMARY INDEX
    SECONDARY INDEX

    MVCC INDEXES

    MVCC DBMS indexes (usually) do not store version information about tuples with their keys. $\rightarrow$ Exception: Index- organized tables (e.g., MySQL)
    Every index must support duplicate keys from different snapshots:
    $\rightarrow$ The same key may point to different logical tuples in different snapshots.

    MVCC DUPLICATE KEY PROBLEM

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    Txn #2

    BEGIN @ 20

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    Txn #2

    BEGIN @ 20

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    Txn #2

    BEGIN @ 20

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    Txn #2

    BEGIN @ 20 COMMIT @ 25

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    Index

    Txn #2

    BEGIN @ 20 COMMIT @ 25

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10

    Txn #2

    BEGIN @ 20 COMMIT @ 25

    Txn #3

    BEGIN @ 30

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10
    Index

    Txn #2

    BEGIN @ 20 COMMIT @ 25

    Txn #3

    BEGIN @ 30

    MVCC DUPLICATE KEY PROBLEM

    Txn #1

    BEGIN @ 10
    Index

    Txn #2

    BEGIN @ 20 COMMIT @ 25

    Txn #3

    BEGIN @ 30

    MVCC INDEXES

    Each index's underlying data structure must support the storage of non- unique keys.
    Use additional execution logic to perform conditional inserts for pkey / unique indexes. $\rightarrow$ Atomically check whether the key exists and then insert.
    Workers may get back multiple entries for a single fetch. They then must follow the pointers to find the proper physical version.

    MVCC DELETES

    The DBMS physically deletes a tuple from the database only when all versions of a logically deleted tuple are not visible.
    $\rightarrow$ If a tuple is deleted, then there cannot be a new version of that tuple after the newest version. $\rightarrow$ No write- write conflicts / first- writer wins
    We need a way to denote that tuple has been logically delete at some point in time.

    MVCC DELETES

    Approach #1: Deleted Flag

    $\rightarrow$ Maintain a flag to indicate that the logical tuple has been deleted after the newest physical version. $\rightarrow$ Can either be in tuple header or a separate column.

    Approach #2: Tombstone Tuple

    $\rightarrow$ Create an empty physical version to indicate that a logical tuple is deleted. $\rightarrow$ Use a separate pool for tombstone tuples with only a special bit pattern in version chain pointer to reduce the storage overhead.

    MVCC IMPLEMENTATIONS

    Table (html):
    ProtocolVersion StorageGarbage CollectionIndexes
    OracleMV2PLDeltaVacuumLogical
    PostgresMV-2PL/MV-TOAppend-OnlyVacuumPhysical
    MySQL-InnoDBMV-2PLDeltaVacuumLogical
    HYRISEMV-OCCAppend-Only-Physical
    HekatonMV-OCCAppend-OnlyCooperativePhysical
    MemSQL (2015)MV-OCCAppend-OnlyVacuumPhysical
    SAP HANAMV-2PLTime-travelHybridLogical
    NuoDBMV-2PLAppend-OnlyVacuumLogical
    HyPerMV-OCCDeltaTxn-levelLogical
    CockroachDBMV-2PLDelta (LSM)CompactionLogical

    CONCLUSION

    MVCC is the widely used scheme in DBMSs. Even systems that do not support multi- statement txns (e.g., NoSQL) use it.

    NEXT CLASS

    Logging and recovery!