CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-19-slides.pdf

Carnegie Mellon University

Database Systems

Multi- Version

Concurrency Control

ADMINISTRIVIA

Project #4 is out; due Sunday April 20th @ 11

$\rightarrow$ Recitation: Friday, April 11th in GHC 4303 from 3

- 4

Final Exam is on Monday, April 28, 2025, from 05

- 08

$\rightarrow$ Early exam will not be offered. Do not make travel plans.

No OH on April 7th (I have to teach another class)

UPCOMING DATABASE TALKS

StarRocks (DB Seminar)

$\longrightarrow$ Monday, March 31 @ 4

$\longrightarrow$ StarRocks Query Optimizer $\longrightarrow$ Speaker: Kaisen Kang $\longrightarrow$ https://cmu.zoom.us/j/93441451665

StarRocks

Oracle (DB Seminar)

$\longrightarrow$ Tuesday, April 1 @ noon $\longrightarrow$ Evolving Transactions in Oracle 23ai: New Advances in Concurrency and Consistency $\longrightarrow$ Speakers: Kishy Kumar and Akshay S. Kulkarni $\longrightarrow$ https://cmu.zoom.us/my/jignesh

ORACLE

LAST CLASS

We discussed concurrency control protocols for generating conflict serializable schedules without needing to know what queries a txn will execute.

The two- phase locking (2PL) protocol requires txns to acquire locks on database objects before they are allowed to access them.

MULTI-VERSION CONCURRENCY CONTROL

The DBMS maintains multiple physical versions of a single logical object in the database:

$\longrightarrow$ When a txn writes to an object, the DBMS creates a new version of that object. $\longrightarrow$ When a txn reads an object, it reads the newest version that existed when the txn started.

MVCC HISTORY

Protocol was first proposed in 1978 MIT PhD dissertation.

First implementations was Rdb/VMS and InterBase at DEC in early 1980s. $\longrightarrow$ Both were by Jim Starkey, co- founder of Nuodb. $\longrightarrow$ DEC Rdb/VMS is now "Oracle Rdb". $\longrightarrow$ InterBase was open- sourced as Firebird.

Rdb/VMS

MVCC HISTORY

Protocol was first proposed in 1978 MIT PhD dissertation.

First implementations was Rdb/VMS and InterBase at DEC in early 1980s. $\longrightarrow$ Both were by Jim Starkey, co- founder of. NuoDB. $\longrightarrow$ DEC Rdb/VMS is now "Oracle Rdb". $\longrightarrow$ InterBase was open- sourced as Firebird.

MULTI-VERSION CONCURRENCY CONTROL

Writers do not block readers. Readers do not block writers.

Read- only txns can read a consistent snapshot without acquiring locks.

$\longrightarrow$ Use timestamps to determine visibility. $\longrightarrow$ MVCC naturally supports Snapshot Isolation (SI).

Multi- versioning without garbage collection allows the DBMS to support time- travel queries.

MVCC - EXAMPLE #1

Schedule

Database

MVCC - EXAMPLE #1

Schedule

Database

MVCC - EXAMPLE #1

Schedule

Database

MVCC - EXAMPLE #1

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

MVCC - EXAMPLE #1

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

MVCC - EXAMPLE #1

Schedule

Database

MVCC - EXAMPLE #1

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

MVCC - EXAMPLE #1

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

MVCC - EXAMPLE #1

Schedule

Database

MVCC - EXAMPLE #1

Schedule

Database

MVCC - EXAMPLE #1

Schedule

Database

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active
T2	2	Active

MVCC - EXAMPLE #1

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	2	123
A1	2	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active
T2	2	Active

MVCC - EXAMPLE #1

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	2	123
A1	2	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active
T2	2	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	-	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	1	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	1	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	1	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active
T2	2	Active

MVCC - EXAMPLE #2

Schedule

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	1	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active
T2	2	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	1	123
A1	1	-	456

Txn Status Table

Table (html):

txnid	timestamp	status
T1	1	Active
T2	2	Active

MVCC - EXAMPLE #2

Schedule

Database

Table (html):

	begin-ts	end-ts	value
A0	0	1	123
A1	1	-	456

Txn Status Table

Table (html):

txnid timestamp	status
T1	1	Committed
T2	2	Active

MVCC - EXAMPLE #2

SNAPSHOT ISOLATION (SI)

When a txn starts, it sees a consistent snapshot of the database that existed when that the txn started. $\rightarrow$ No torn writes from active txns. $\rightarrow$ If two txns update the same object, then first writer wins.

SI is susceptible to the Write Skew Anomaly.

WRITE SKEW ANOMALY

Txn #1 Change all white marbles to black.

Txn #2 Change all black marbles to white.

WRITE SKEW ANOMALY

MULTI-VERSION CONCURRENCY CONTROL

MVCC is more than just a concurrency control protocol. It completely affects how the DBMS manages transactions and the database.

MULTI-VERSION CONCURRENCY CONTROL

MVCC is more than just a concurrency control protocol. It completely affects how the DBMS manages transactions and the database.

MVCC DESIGN DECISIONS

Concurrence Control ProtocolVersion StorageGarbage CollectionIndex ManagementDeletes

CONCURRENCY CONTROL PROTOCOL

Approach #1: Timestamp Ordering

→ Assign txns timestamps that determine serial order.

Approach #2: Optimistic Concurrency Control

→ Three- phase protocol from last class. → Use private workspace for new versions.

Approach #3: Two-Phase Locking

→ TXns acquire appropriate lock on physical version before they can read/write a logical tuple.

VERSION STORAGE

The DBMS uses the tuples' pointer field to create a version chain per logical tuple.

→ This allows the DBMS to find the version that is visible to a particular txn at runtime. → Indexes always point to the "head" of the chain.

Different storage schemes determine where/what to store for each version.

VERSION STORAGE

Approach #1: Append- Only Storage $\rightarrow$ New versions are appended to the same table space.

Approach #2: Time- Travel Storage $\rightarrow$ Old versions are copied to separate table space.

Approach #3: Delta Storage $\rightarrow$ The original values of the modified attributes are copied into a separate delta record space.

VERSION STORAGE

Approach #1: Append-Only Storage

→ New versions are appended to the same table space.

Do This!

Approach #2: Time-Travel Storage

→ Old versions are copied to separate table space.

Approach #3: Delta Storage

→ The original values of the modified attributes are copied into a separate delta record space.

APPEND-ONLY STORAGE

All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.

On every update, append a new version of the tuple into an empty space in the table.

Main Table

Table (html):

	value	pointer
A0	$111
A1	$222	0
B1	$10	0

APPEND-ONLY STORAGE

All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.

On every update, append a new version of the tuple into an empty space in the table.

Main Table

APPEND-ONLY STORAGE

All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.

On every update, append a new version of the tuple into an empty space in the table.

Main Table

APPEND-ONLY STORAGE

All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.

On every update, append a new version of the tuple into an empty space in the table.

Main Table

Table (html):

	value	pointer
A0	$111
A1	$222	0
B1	$10	0
A2	$333	0

APPEND-ONLY STORAGE

All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.

On every update, append a new version of the tuple into an empty space in the table.

Main Table

APPEND-ONLY STORAGE

All the physical versions of a logical tuple are stored in the same table space. The versions are inter- mixed.

On every update, append a new version of the tuple into an empty space in the table.

Main Table

VERSION CHAIN ORDERING

Approach #1: Oldest-to-Newest (O2N)

$\rightarrow$ Append new version to end of the chain. $\rightarrow$ Must traverse chain on look- ups.

Approach #2: Newest-to-Oldest (N2O)

$\rightarrow$ Must update index pointers for every new version. $\rightarrow$ Do not have to traverse chain on look- ups. $\rightarrow$ This is typically the better approach because most txns only care about the newest version.

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

Table (html):

	value	pointer
A2	$222
B1	$10

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

On every update, copy the current version to the time- travel table. Update pointers.

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

On every update, copy the current version to the time- travel table. Update pointers.

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

On every update, copy the current version to the time- travel table. Update pointers.

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

TIME-TRAVEL STORAGE

Main Table

Time-Travel Table

DELTA STORAGE

Main Table

Delta Storage Segment

Table (html):

	value	pointer
A1	$111
B1	$10

DELTA STORAGE

Main Table

Delta Storage Segment

Table (html):

	value	pointer
A1	$111
B1	$10

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

DELTA STORAGE

Main Table

Delta Storage Segment

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

DELTA STORAGE

Main Table

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

Delta Storage Segment

Table (html):

	delta	pointer
A1	(VALUE→$111)	0

DELTA STORAGE

Main Table

Delta Storage Segment

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

DELTA STORAGE

Main Table

Delta Storage Segment

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

DELTA STORAGE

Main Table

Delta Storage Segment

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

DELTA STORAGE

Main Table

Delta Storage Segment

On every update, copy only the column values that were modified to the delta storage and overwrite the master version.

DELTA STORAGE

Main Table

Delta Storage Segment

GARBAGE COLLECTION

The DBMS needs to remove reclaimable physical versions from the database over time. $\rightarrow$ No active txn in the DBMS can "see" that version (SI). $\rightarrow$ The version was created by an aborted txn.

Two additional design decisions: $\rightarrow$ How to look for expired versions? $\rightarrow$ How to decide when it is safe to reclaim memory?

GARBAGE COLLECTION

Two additional design decisions: $\rightarrow$ How to look for expired versions? $\rightarrow$ How to decide when it is safe to reclaim memory?

GARBAGE COLLECTION

Approach #1: Tuple-level

→ Find old versions by examining tuples directly. → Background Vacuuming vs. Cooperative Cleaning

Approach #2: Transaction-level

→ Txns keep track of their old versions so the DBMS does not have to scan tuples to determine visibility.

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Table (html):

	begin-ts	end-ts
A100	1	9
B100	1	9
B101	10	20

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Table (html):

	begin-ts	end-ts
A100	1	9
B100	1	9
B101	10	20

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Vacuum

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Table (html):

	begin-ts	end-ts
A100	1	9
B100	1	9
B101	10	20

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Table (html):

	begin-ts	end-ts
A100	1	9
B100	1	9
B101	10	20

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Table (html):

Dirty Block BitMap	begin-ts	end-ts


B101	10	20

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Table (html):

	begin-ts	end-ts


B101	10	20

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Table (html):

	begin-ts	end-ts


B101	10	20

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TUPLE-LEVEL GC

Txn #1

Tid=12

Txn #2

Tid=25

Background Vacuuming:

Separate thread(s) periodically scan the table and look for reclaimable versions. Works with any storage.

Cooperative Cleaning:

Worker threads identify reclaimable versions as they traverse version chain. Only works with O2N.

TRANSACTION-LEVEL GC

Each txn keeps track of its read/write set. On commit/abort, the txn provides this information to a centralized vacuum worker.

The DBMS periodically determines when all versions created by a finished txn are no longer visible.

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Table (html):

begin-ts	end-ts	value
A2	1	∞	-
B6	8	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Table (html):

	begin-ts	end-ts	value
A2	1	∞	-
B6	8	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Table (html):

begin-ts	end-ts	value
A2	1	∞	-
B6	8	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Table (html):

	begin-ts	end-ts	value
A2	1	10	-
B6	8	∞	-
A3	10	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Old Versions

Table (html):

	begin-ts	end-ts	value
A2	1	10	-
B6	8	∞	-
A3	10	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Old Versions

Table (html):

	begin-ts	end-ts	value
A2	1	10	-
B6	8	∞	-
A3	10	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10

Old Versions

Table (html):

	begin-ts	end-ts	value
A2	1	10	-
B6	8	10	-
A3	10	∞	-
B7	10	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10 COMMIT @ 15

Old Versions

Table (html):

	begin-ts	end-ts	value
A2	1	10	-
B6	8	10	-
A3	10	∞	-
B7	10	∞	-

TRANSACTION-LEVEL GC

Txn #1

BEGIN @ 10 COMMIT @ 15

Old Versions

[ImageCaption: Vacuum]

Table (html):

	begin-ts	end-ts	value
A2	1	10	-
B6	8	10	-
A3	10	∞	-
B7	10	∞	-

INDEX MANAGEMENT

Primary key indexes point to version chain head. $\rightarrow$ How often the DBMS must update the pkey index depends on whether the system creates new versions when a tuple is updated. $\rightarrow$ If a txn updates a tuple's pkey attribute(s), then this is treated as a DELETE followed by an INSERT.

Secondary indexes are more complicated...

Primary key in → How often th on whether ti is updated. → If a txn updat treated as a D

Secondary in

ARCHITECTURE WHY UBER ENGINEERING SWITCHED FROM POSTGRES TO MYSQL

JULY 26, 2016 BY EVAN KLITZKE

SECONDARY INDEXES

Approach #1: Logical Pointers

→ Use a fixed identifier per tuple that does not change. $\rightarrow$ Requires an extra indirection layer. $\rightarrow$ Primary Key vs. Tuple Id

Approach #2: Physical Pointers

→ Use the physical address to the version chain head.

INDEX POINTERS: APPEND-ONLY

PRIMARY INDEX

SECONDARY INDEX

INDEX POINTERS: APPEND-ONLY

PRIMARY INDEX

SECONDARY INDEX

INDEX POINTERS: APPEND-ONLY

GET(A)

PRIMARY INDEX

SECONDARY INDEX

INDEX POINTERS: APPEND-ONLY

PRIMARY INDEX

SECONDARY INDEX

INDEX POINTERS: APPEND-ONLY

GET(A)

PRIMARY INDEX

SECONDARY INDEX

INDEX POINTERS: APPEND-ONLY

GET(A)

PRIMARY INDEX

SECONDARY INDEX

INDEX POINTERS: APPEND-ONLY

GET(A)

PRIMARY INDEX

SECONDARY INDEX

MVCC INDEXES

MVCC DBMS indexes (usually) do not store version information about tuples with their keys. $\rightarrow$ Exception: Index- organized tables (e.g., MySQL)

Every index must support duplicate keys from different snapshots:

$\rightarrow$ The same key may point to different logical tuples in different snapshots.

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Txn #2

BEGIN @ 20

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Txn #2

BEGIN @ 20

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Txn #2

BEGIN @ 20

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Txn #2

BEGIN @ 20 COMMIT @ 25

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Index

Txn #2

BEGIN @ 20 COMMIT @ 25

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Txn #2

BEGIN @ 20 COMMIT @ 25

Txn #3

BEGIN @ 30

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Index

Txn #2

BEGIN @ 20 COMMIT @ 25

Txn #3

BEGIN @ 30

MVCC DUPLICATE KEY PROBLEM

Txn #1

BEGIN @ 10

Index

Txn #2

BEGIN @ 20 COMMIT @ 25

Txn #3

BEGIN @ 30

MVCC INDEXES

Each index's underlying data structure must support the storage of non- unique keys.

Use additional execution logic to perform conditional inserts for pkey / unique indexes. $\rightarrow$ Atomically check whether the key exists and then insert.

Workers may get back multiple entries for a single fetch. They then must follow the pointers to find the proper physical version.

MVCC DELETES

The DBMS physically deletes a tuple from the database only when all versions of a logically deleted tuple are not visible.

$\rightarrow$ If a tuple is deleted, then there cannot be a new version of that tuple after the newest version. $\rightarrow$ No write- write conflicts / first- writer wins

We need a way to denote that tuple has been logically delete at some point in time.

MVCC DELETES

Approach #1: Deleted Flag

$\rightarrow$ Maintain a flag to indicate that the logical tuple has been deleted after the newest physical version. $\rightarrow$ Can either be in tuple header or a separate column.

Approach #2: Tombstone Tuple

$\rightarrow$ Create an empty physical version to indicate that a logical tuple is deleted. $\rightarrow$ Use a separate pool for tombstone tuples with only a special bit pattern in version chain pointer to reduce the storage overhead.

MVCC IMPLEMENTATIONS

Table (html):

	Protocol	Version Storage	Garbage Collection	Indexes
Oracle	MV2PL	Delta	Vacuum	Logical
Postgres	MV-2PL/MV-TO	Append-Only	Vacuum	Physical
MySQL-InnoDB	MV-2PL	Delta	Vacuum	Logical
HYRISE	MV-OCC	Append-Only	-	Physical
Hekaton	MV-OCC	Append-Only	Cooperative	Physical
MemSQL (2015)	MV-OCC	Append-Only	Vacuum	Physical
SAP HANA	MV-2PL	Time-travel	Hybrid	Logical
NuoDB	MV-2PL	Append-Only	Vacuum	Logical
HyPer	MV-OCC	Delta	Txn-level	Logical
CockroachDB	MV-2PL	Delta (LSM)	Compaction	Logical

CONCLUSION

MVCC is the widely used scheme in DBMSs. Even systems that do not support multi- statement txns (e.g., NoSQL) use it.

NEXT CLASS

Logging and recovery!