GithubDocuments
    1. Marketplace
    2. CMU Database - 15445 - 2025 Spring
    Collections
    CMU Database - 15445 - 2025 Spring
    cmu_database
    This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...
    DocumentsKnowledge Graph
    lecture-16-slides.pdf
    Carnegie Mellon University
    Database Systems
    Concurrency Control Theory

    ADMINISTRIVIA

    Project #3 is due Sunday March 30th @ 11
    $\rightarrow$ Recitation: slides, recording

    COURSE STATUS

    A DBMS's concurrency control and recovery components permeate throughout the design of its entire architecture.
    Query Planning
    Operator Execution
    Access Methods
    Buffer Pool Manager
    Disk Manager

    COURSE STATUS

    A DBMS's concurrency control and recovery components permeate throughout the design of its entire architecture.
    Query Planning
    Concurrency Control
    Operator Execution
    Access Methods
    Recovery
    Buffer Pool Manager
    Disk Manager

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > $25); Pay($25); A = A - $25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > $25); Pay($ 25); A = A - $25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > $25); Pay($25);

    MOTIVATION EXAMPLE #1

    Application Logic

    Read(A); Check(A > $25); Pay($25);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A);
    Check(A > $25);
    Pay($25);
    A = A - $25;
    Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A);
    Check(A > $25);
    Pay($25);
    A = A - $25;
    Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A);
    Check(A > $25);
    Pay($25);
    A = A - $25;
    Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    MOTIVATION EXAMPLE #2

    Application Logic

    Read(A); Check(A > (25); Pay()25); A = A - (25; Write(A);

    STRAWMAN SYSTEM

    Execute each txn one- by- one (i.e., serial order) as they arrive at the DBMS.
    $\rightarrow$ One and only one txn can be running simultaneously in the DBMS.
    Before a txn starts, copy the entire database to a new file and make all changes to that file. $\rightarrow$ If the txn completes successfully, overwrite the original file with the new one. $\rightarrow$ If the txn fails, just remove the dirty copy.

    PROBLEM STATEMENT

    A (potentially) better approach is to allow concurrent execution of independent transactions.
    Why do we want that?
    $\rightarrow$ Better utilization/throughput $\rightarrow$ Increased response times to users.
    But we also would like:
    $\rightarrow$ Correctness $\rightarrow$ Fairness

    PROBLEM STATEMENT

    Arbitrary interleaving of operations can lead to: $\rightarrow$ Temporary Inconsistency (ok, unavoidable) $\rightarrow$ Permanent Inconsistency (bad!)
    The DBMS is only concerned about what data is read/written from/to the database. $\rightarrow$ Changes to the "outside world" are beyond the scope of the DBMS.
    We need formal correctness criteria to determine whether an interleaving is valid.

    FORMAL DEFINITIONS

    Database: A fixed set of named data objects (e.g., A, B, C, ...).
    $\rightarrow$ We do not need to define what these objects are now. $\rightarrow$ We will discuss how to handle inserts/deletes next week.
    Transaction: A sequence of read and write operations (e.g., R(A), W(B), ...) $\rightarrow$ DBMS's abstract view of a user program. $\rightarrow$ A new txn starts with the BEGIN command. $\rightarrow$ The txn stops with either COMMIT or ROLLBACK

    CORRECTNESS CRITERIA: ACID

    Atomicity

    All actions in txn happen, or none happen. "All or nothing..."
    Consistency
    If each txn is consistent and the DB starts consistent, then it ends up consistent. "It looks correct to me..."
    Isolation
    Execution of one txn is isolated from that of other txns. "All by myself..."
    Durability
    If a txn commits, its effects persist. "I will survive..."

    TODAY'S AGENDA

    AtomicityConsistencyIsolationDurabilityDB Flash Talk: ClickHouse

    ATOMICITY OF TRANSACTIONS

    Two possible outcomes of executing a txn:
    $\rightarrow$ Commit after completing all its actions. $\rightarrow$ Abort (or be aborted by the DBMS) after executing some actions.
    DBMS guarantees that txn are atomic.
    $\rightarrow$ From user's point of view: txn always either executes all its actions or executes no actions at all.

    A

    MECHANISMS FOR ENSURING ATOMICITY

    Approach #1: Logging

    $\rightarrow$ DBMS logs all actions so that it can undo the actions of aborted transactions. $\rightarrow$ Maintain undo records both in memory and on disk. $\rightarrow$ Think of this like the black box in airplanes...
    Logging is used by almost every DBMS. $\rightarrow$ Audit Trail $\rightarrow$ Efficiency Reasons

    MECHANISMS FOR ENSURING ATOMICITY

    Approach #2: Shadow Paging

    $\rightarrow$ DBMS makes copies of pages and txns make changes to those copies. Only when the txn commits is the page made visible to others. $\rightarrow$ Originally from IBM System R.
    Few systems do this: $\rightarrow$ CouchDB $\rightarrow$ Tokyo Cabinet $\rightarrow$ LMDB (OpenLDAP)

    MECHANISMS FOR ENSURING ATOMICITY

    Approach #2: Shadow Paging

    $\rightarrow$ DBMS makes copies of pages and txns make changes to those copies. Only when the txn commits is the page made visible to others. $\rightarrow$ Originally from IBM System R.
    Few systems do this: $\rightarrow$ CouchDB $\rightarrow$ Tokyo Cabinet $\rightarrow$ LMDB (OpenLDAP)

    CONSISTENCY

    The database accurately models the real world.
    $\rightarrow$ SQL has methods to specify integrity constraints (e.g., key definitions, CHECK and ADD CONSTRAINT) and the DBMS will enforce them.
    $\rightarrow$ Application must define these constraints.
    $\rightarrow$ DBMS ensures that all ICs are true before and after the transaction ends.

    CONSISTENCY

    The database accurately models the real world.
    $\rightarrow$ SQL has methods to specify integrity constraints (e.g., key definitions, CHECK and ADD CONSTRAINT) and the DBMS will enforce them.
    $\rightarrow$ Application must define these constraints.
    $\rightarrow$ DBMS ensures that all ICs are true before and after the transaction ends.

    A note on Eventual Consistency.

    $\rightarrow$ A committed transaction may see inconsistent results (e.g., may not see the updates of an older committed txn). $\rightarrow$ Difficult for developers to reason about such semantics. $\rightarrow$ The trend is to move away from such models.

    CONSISTENCY

    The database accurately models the real world.
    $\rightarrow$ SQL has methods to specify integrity constraints (e.g., key definitions, CHECK and ADD CONSTRAINT) and the DBMS will enforce them.
    $\rightarrow$ Application must define these constraints.
    $\rightarrow$ DBMS ensures that all ICs are true before and after the transaction ends.

    A note on Eventual Consistency.

    $\rightarrow$ A committed transaction may see inconsistent results (e.g., may not see the updates of an older committed txn). $\rightarrow$ Difficult for developers to reason about such semantics. $\rightarrow$ The trend is to move away from such models.

    ISOLATION OF TRANSACTIONS

    Users submit txns, and each txn executes as if it were running by itself. $\rightarrow$ Easier programming model to reason about.

    ISOLATION OF TRANSACTIONS

    Users submit txns, and each txn executes as if it were running by itself. $\rightarrow$ Easier programming model to reason about.
    But the DBMS achieves concurrency by interleaving the actions (reads/writes of DB objects) of txns.
    We need a way to interleave txns but still make it appear as if they ran one- at- a- time.

    MECHANISMS FOR ENSURING ISOLATION

    A concurrency control protocol is how the DBMS decides the proper interleaving of operations from multiple transactions.
    Two categories of protocols: $\rightarrow$ Pessimistic: Don't let problems arise in the first place. $\rightarrow$ Optimistic: Assume conflicts are rare; deal with them after they happen.

    EXAMPLE

    Assume at first A and B each have $1000. $T_1$ transfers $100 from A’s account to B’s $T_2$ credits both accounts with 6% interest.

    EXAMPLE

    Assume at first A and B each have $1000.
    What are the possible outcomes of running $T_1$ and $T_2$ ?

    EXAMPLE

    Assume at first A and B each have $1000.
    What are the possible outcomes of running $T_1$ and $T_2$ ?
    Many! But A+B should be: $\rightarrow$ $2000*1.06=$2120
    There is no guarantee that $T_1$ will execute before $T_2$ or vice- versa, if both are submitted together.
    But the net effect must be equivalent to these two transactions running serially in some order.

    EXAMPLE

    Legal outcomes: $\begin{array}{r}\rightarrow \mathsf{A} = 954,\mathsf{B} = 1166\ \rightarrow \mathsf{A} = 960,\mathsf{B} = 1160 \end{array}$
    The outcome depends on whether $T_1$ executes before $T_2$ or vice versa.

    EXAMPLE

    Legal outcomes:
    $\rightarrow A = 954, B = 1166 \rightarrow A + B =$ 2120 $\rightarrow A = 960, B = 1160 \rightarrow A + B =$ 2120
    The outcome depends on whether $T_1$ executes before $T_2$ or vice versa.

    SERIAL EXECUTION EXAMPLE

    Schedule

    Schedule

    SERIAL EXECUTION EXAMPLE

    INTERLEAVING TRANSACTIONS

    We interleave txns to maximize concurrency. $\rightarrow$ Slow disk/network I/O. $\rightarrow$ Multi- core CPUs.
    When one txn stalls because of a resource (e.g., page fault), another txn can continue executing and make forward progress.

    INTERLEAVING EXAMPLE (GOOD)

    Schedule

    Table (html):
    T1T2
    BEGIN A=A-100BEGIN A=A*1.06
    B=B+100 COMMITB=B*1.06 COMMIT
    A=954, B=1166

    INTERLEAVING EXAMPLE (GOOD)

    Schedule

    Schedule

    INTERLEAVING EXAMPLE (GOOD)

    Schedule

    Schedule

    INTERLEAVING EXAMPLE (GOOD)

    INTERLEAVING EXAMPLE (BAD)

    Schedule

    A=954, B=1166 or A=960, B=1160

    INTERLEAVING EXAMPLE (BAD)

    Schedule

    INTERLEAVING EXAMPLE (BAD)

    Schedule

    DBMS View

    Table (html):
    T1T2
    BEGIN A=A-100BEGIN A=A*1.06 B=B*1.06 COMMIT
    B=B+100
    COMMIT
    A=954, B=1160

    A+B=$2114

    Table (html):
    T1T2
    BEGIN R(A) W(A)
    BEGIN R(A) W(A) R(B) W(B) COMMIT
    R(B) W(B) COMMIT

    INTERLEAVING EXAMPLE (BAD)

    Schedule
    DBMS View
    A+B=$2114

    INTERLEAVING EXAMPLE (BAD)

    Schedule

    A+B=$2114
    How do we judge whether a schedule is correct?

    INTERLEAVING EXAMPLE (BAD)

    Schedule

    A+B=$2114
    How do we judge whether a schedule is correct?
    If the schedule is equivalent to some serial execution.

    FORMAL PROPERTIES OF SCHEDULES

    Serial Schedule

    $\rightarrow$ A schedule that does not interleave the actions of different transactions.

    Equivalent Schedules

    $\rightarrow$ For any database state, the effect of executing the first schedule is identical to the effect of executing the second schedule.

    FORMAL PROPERTIES OF SCHEDULES

    Serializable Schedule

    → A schedule that is equivalent to some serial execution of the transactions. → If each transaction preserves consistency, every serializable schedule preserves consistency.

    FORMAL PROPERTIES OF SCHEDULES

    Serializable Schedule

    $\rightarrow$ A schedule that is equivalent to some serial execution of the transactions. $\rightarrow$ If each transaction preserves consistency, every serializable schedule preserves consistency.
    Serializable is a less intuitive notion of correctness compared to txn initiation time or commit order, but it provides the DBMS with more flexibility in scheduling operations. $\rightarrow$ More flexibility means better parallelism.

    CONFLICTING OPERATIONS

    We need a formal notion of equivalence that can be implemented efficiently based on the notion of "conflicting" operations.
    Two operations conflict if:
    $\rightarrow$ They are by different transactions, $\rightarrow$ They are on the same object and one of them is a write.

    CONFLICTING OPERATIONS

    We need a formal notion of equivalence that can be implemented efficiently based on the notion of "conflicting" operations.
    Two operations conflict if:
    $\rightarrow$ They are by different transactions, $\rightarrow$ They are on the same object and one of them is a write.

    Interleaved Execution Anomalies

    $\rightarrow$ Unrepeatable Read (Read- Write) $\rightarrow$ Dirty Read (Write- Read) $\rightarrow$ Lost Update (Write- Write)

    CONFLICTING OPERATIONS

    We need a formal notion of equivalence that can be implemented efficiently based on the notion of "conflicting" operations.
    Two operations conflict if:
    $\rightarrow$ They are by different transactions, $\rightarrow$ They are on the same object and one of them is a write.

    Interleaved Execution Anomalies

    $\rightarrow$ Unrepeatable Read (Read- Write) $\rightarrow$ Dirty Read (Write- Read) $\rightarrow$ Lost Update (Write- Write) $\rightarrow$ Phantom Reads (Scan- Write) $\rightarrow$ Write- Skew (Read- Write)

    CONFLICTING OPERATIONS

    We need a formal notion of equivalence that can be implemented efficiently based on the notion of "conflicting" operations.
    Two operations conflict if:
    $\rightarrow$ They are by different transactions, $\rightarrow$ They are on the same object and one of them is a write.

    Interleaved Execution Anomalies

    $\rightarrow$ Unrepeatable Read (Read- Write) $\rightarrow$ Dirty Read (Write- Read) $\rightarrow$ Lost Update (Write- Write) $\rightarrow$ Phantom Reads (Scan- Write) Lecture #17 $\rightarrow$ Write- Skew (Read- Write) Lecture #19

    READ-WRITE CONFLICTS

    Unrepeatable Read: Txn gets different values when reading the same object multiple times.

    READ-WRITE CONFLICTS

    Unrepeatable Read: Txn gets different values when reading the same object multiple times.

    READ-WRITE CONFLICTS

    Unrepeatable Read: Txn gets different values when reading the same object multiple times.

    READ-WRITE CONFLICTS

    Unrepeatable Read: Txn gets different values when reading the same object multiple times.

    READ-WRITE CONFLICTS

    Unrepeatable Read: Txn gets different values when reading the same object multiple times.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-READ CONFLICTS

    Dirty Read: One txn reads data written by another txn that has not committed yet.

    WRITE-WRITE CONFLICTS

    Lost Update: One txn overwrites uncommitted data from another uncommitted txn.

    WRITE-WRITE CONFLICTS

    Lost Update: One txn overwrites uncommitted data from another uncommitted txn.

    WRITE-WRITE CONFLICTS

    Lost Update: One txn overwrites uncommitted data from another uncommitted txn.

    FORMAL PROPERTIES OF SCHEDULES

    Given these conflicts, we now can understand what it means for a schedule to be serializable. $\rightarrow$ This is to check whether schedules are correct. $\rightarrow$ This is not how to generate a correct schedule.
    There are different levels of serializability: $\rightarrow$ Conflict Serializability $\rightarrow$ View Serializability

    FORMAL PROPERTIES OF SCHEDULES

    Given these conflicts, we now can understand what it means for a schedule to be serializable. $\rightarrow$ This is to check whether schedules are correct. $\rightarrow$ This is not how to generate a correct schedule.

    FORMAL PROPERTIES OF SCHEDULES

    Given these conflicts, we now can understand what it means for a schedule to be serializable. $\rightarrow$ This is to check whether schedules are correct. $\rightarrow$ This is not how to generate a correct schedule.
    There are different levels of $\rightarrow$ Conflict Serializability $\rightarrow$ View Serializability
    No DBMS can do this.
    Most DBMSs try to support this.

    CONFLICT SERIALIZABLE SCHEDULES

    Two schedules are conflict equivalent iff:
    $\rightarrow$ They involve the same actions of the same transactions. $\rightarrow$ Every pair of conflicting actions is ordered the same way.
    Schedule S is conflict serializable if:
    $\rightarrow$ S is conflict equivalent to some serial schedule. $\rightarrow$ Intuition: You can transform S into a serial schedule by swapping consecutive non- conflicting operations of different transactions.

    DEPENDENCY GRAPHS

    One node per txn.
    Edge from T; to T; if:
    $\rightarrow$ An operation $0_{\mathrm{i}}$ of $T_{\mathrm{i}}$ conflicts with an operation $0_{\mathrm{j}}$ of $T_{\mathrm{j}}$ and $\rightarrow 0_{\mathrm{i}}$ appears earlier in the schedule than $0_{\mathrm{j}}$ .
    Also known as a precedence graph. A schedule is conflict serializable iff its dependency graph is acyclic.

    Dependency Graph

    EXAMPLE #1

    Schedule

    Dependency Graph

    EXAMPLE #1

    Schedule

    Dependency Graph

    EXAMPLE #1

    Schedule

    Dependency Graph

    EXAMPLE #1

    Schedule

    Dependency Graph

    EXAMPLE #1

    Schedule

    Dependency Graph

    EXAMPLE #1

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    Is this equivalent to a serial execution?

    EXAMPLE #2 - THREE TRANSACTIONS

    Schedule

    Dependency Graph

    Is this equivalent to a serial execution? Yes $(T_{2},T_{1},T_{3})$ $\rightarrow$ Notice that $\mathsf{T}3$ should go after $\mathsf{T}{2}$ although it starts before it!

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 - INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    EXAMPLE #3 – INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    Is it possible to modify only the application logic so that schedule produces a "correct" result but is still not conflict serializable?

    EXAMPLE #3 – INCONSISTENT ANALYSIS

    Schedule

    Dependency Graph

    Is it possible to modify only the application logic so that schedule produces a "correct" result but is still not conflict serializable?

    VIEW SERIALIZABILITY

    Alternative (broader) notion of serializability.
    Schedules $\mathsf{S}_1$ and $\mathsf{S}_2$ are view equivalent if:
    $\rightarrow$ If $\mathsf{T}_1$ reads initial value of A in $\mathsf{S}_1$ , then $\mathsf{T}_1$ also reads initial value of A in $\mathsf{S}_2$ . $\rightarrow$ If $\mathsf{T}_1$ reads value of A written by $\mathsf{T}_2$ in $\mathsf{S}_1$ , then $\mathsf{T}_1$ also reads value of A written by $\mathsf{T}_2$ in $\mathsf{S}_2$ . $\rightarrow$ If $\mathsf{T}_1$ writes final value of A in $\mathsf{S}_1$ , then $\mathsf{T}_1$ also writes final value of A in $\mathsf{S}_2$ .

    VIEW SERIALIZABILITY

    Schedule

    Dependency Graph

    VIEW SERIALIZABILITY

    Schedule

    Dependency Graph

    VIEW SERIALIZABILITY

    Schedule

    Dependency Graph

    VIEW SERIALIZABILITY

    Schedule

    Dependency Graph

    VIEW SERIALIZABILITY

    Schedule

    Dependency Graph

    VIEW SERIALIZABILITY

    Schedule

    Dependency Graph

    VIEW SERIALIZABILITY

    Schedule

    Schedule

    VIEW SERIALIZABILITY

    VIEW SERIALIZABILITY

    SERIALIZABILITY

    View Serializability allows for (slightly) more schedules than Conflict Serializability does. $\longrightarrow$ But it is difficult to enforce efficiently.
    Neither definition allows all schedules that you would consider "serializable." $\longrightarrow$ This is because they don't understand the meanings of the operations or the data (recall example #3) $\longrightarrow$ In practice, Conflict Serializability is what systems support because it can be enforced efficiently.

    UNIVERSE OF SCHEDULES

    All Schedules

    UNIVERSE OF SCHEDULES

    All Schedules
    Serial

    UNIVERSE OF SCHEDULES

    All Schedules

    UNIVERSE OF SCHEDULES

    All Schedules
    View Serializable
    Conflict Serializable
    Serial

    TRANSACTION DURABILITY

    All the changes of committed transactions should be persistent. $\longrightarrow$ No torn updates. $\longrightarrow$ No changes from failed transactions.
    The DBMS can use either logging or shadow paging to ensure that all changes are durable.

    CORRECTNESS CRITERIA: ACID

    Atomicity
    All actions in txn happen, or none happen. "All or nothing..."
    Consistency
    If each txn is consistent and the DB starts consistent, then it ends up consistent. "It looks correct to me..."
    Isolation
    Execution of one txn is isolated from that of other txns. "All by myself..."
    Durability
    If a txn commits, its effects persist. "I will survive..."

    CORRECTNESS CRITERIA: ACID

    Atomicity
    All actions in txn happen, or none happen. "All or nothing..."
    Consistency
    If each txn is consistent and the DB starts consistent, then it ends up consistent. "It looks correct to me..."
    Isolation
    Execution of one txn is isolated from that of other txns. "All by myself..."
    Durability
    If a txn commits, its effects persist. "I will survive..."

    CORRECTNESS CRITERIA: ACID

    CORRECTNESS CRITERIA: ACID

    Redo/Undo Mechanism

    Atomicity

    All actions in txn happen, or none happen. "All or nothing..."
    Integrity Constraints

    Consistency

    If each txn is consistent and the DB starts consistent, then it ends up consistent. "It looks correct to me..."
    Concurrency Control

    Isolation

    Execution of one txn is isolated from that of other txns. "All by myself..."
    Redo/Undo Mechanism

    Durability

    If a txn commits, its effects persist. "I will survive..."

    CORRECTNESS CRITERIA: ACID

    Redo/Undo Mechanism

    Atomicity

    All actions in txn happen, or none happen. "All or nothing..."
    Integrity Constraints

    Consistency

    If each txn is consistent and the DB starts consistent, then it ends up consistent. "It looks correct to me..."
    Concurrency Control

    Isolation

    Execution of one txn is isolated from that of other txns. "All by myself..."
    Redo/Undo Mechanism

    Durability

    If a txn commits, its effects persist. "I will survive..."

    CONCLUSION

    Concurrency control and recovery are among the most important functions provided by a DBMS.
    Concurrency control is automatic
    $\longrightarrow$ System automatically inserts lock/unlock requests and schedules actions of different txns. $\longrightarrow$ Ensures that resulting execution is equivalent to executing the txns one after the other in some order.
    Just like "NoSQL" there was a "who needs transactions" phase. That has passed. $\longrightarrow$ SQL and transactions are good and necessary!

    CONCLUSI

    Concurrency control and recovery are among the most important functions provided by a DBMS.
    Concurrency control is automatic $\longrightarrow$ System automatically inserts lock/unlock requests and schedules actions of different $\longrightarrow$ Ensures that resulting execution is equiv a to executing the txns one after the other i some order.
    Just like "NoSQL" there was a "who n transactions" phase. That has passed. $\longrightarrow$ SQL and transactions are good and neces

    Spanner: Google's Globally-Distributed Database

    James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hong Li Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rai, Liang- Roliq, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford
    Google, Inc.

    Abstract

    Spanner is Google's scalable, multi- version, globally- distributed, and synchronously- replicated database. It is the first system to distribute data at global scale and an open externally- consistent distributed transactions. The paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting extralocal consistency and a variety of powerful features: nonblocking reads in the past, lock- free read- only transactions, and atomic schema changes, across all of Spanner.

    1 Introduction

    Spanner is a scalable, globally- distributed database designed, built, and deployed at Google. At the highest est level of abstraction, it is a database that shurs data across many sets of Paxos [2] state machines in data centers spread all over the world. Replication is used for global availability and geographic locality; clients automatically failover between replicas. Spanner automatically rehashs data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures. Spanner designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows. Applications can use Spanner for high availability, even in the face of wide- area natural disasters, by replicating their data within or even across continents. Our initial customer was FI [33], a rewrite of Google's advertising backend. FI uses five replicas spread across the United States. Most other applications will probably replicate their data across 3 to 5 datacenters in one geographic region, but with relatively independent failure modes. That is, most applications will choose lower la
    tency over higher availability, as long as they can survive 1 or 2 datacenter failures. Spanner's main focus is managing cross- datacenter replicated data, but we have also spent a great deal of time in designing and implementing important database features on top of our distributed- systems infrastructure. Even though many projects happily use Bigtable [4], we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide- area replication. (Similar claims have been made by other authors [57].) Many applications at Google have chosen to use Megastore [5] because of its semi- relational data model and support for synchronous replication, despite its relatively poor write throughput. As a consequence, Spanner has evolved from a Bigtable- like versioned key- value store into a temporal multi- version database. Data is stored in schematized semi- relational tables; data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to configurable garbage- collection policies; and applications can read data into old timestamps. Spanner supports general- purpose transactions, and provides a SQL- based query language.
    As a globally- distributed database, Spanner provides several interesting features. First, the replication configurations for data can be dynamically controlled on fine grain by applications. Applications can specify a straints to control which datacenters contain which data, how far data is from its users (to control read latency), how far replicas are from each other (to control write latency), and how many replicas are maintained (to control durability, availability, and read performance). Data can also be dynamically and transparently moved between datacenters by the system to balance resource usage across datacenters. Second, Spanner has two features that are difficult to implement in a distributed database: it
    Published in the Proceedings of OSDI 2012

    CONCLUSI

    Concurrency control and recovery are among the most important functions provided by a DBMS.
    Concurrency control is automatic $\longrightarrow$ System automatically inserts lock/unlock requests and schedules actions of different

    Spanner: Google's Globally-Distributed Database

    James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hong Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rai, Liang- Roliq, Yasushi Saito, Michal Scymaniak, Christopher Taylor, Ruth Wang, Dale Woodford
    Google, Inc.

    Abstract

    Spanner is Google's scalable, multi- version, globally- distributed, and synchronously- replicated database. It is the first system to distribute data at global scale and an open externally- consistent distributed transactions. The paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock- free read- only transactions, and atomic schema changes, across all of Spanner.

    1 Introduction

    tency over higher availability, as long as they can survive 1 or 2 datacenter failures. Spanner's main focus is managing cross- datacenter replicated data, but we have also spent a great deal of time in designing and implementing important database features on top of our distributed- systems infrastructure. Even though many projects happily use Bigtable [9], we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide- area replication. (Similar claims have been made by other authors [3]). Many applications at Google have chosen to use Megastore [3] because of its semi- relational data model and support for synchronous replication. As a result, its relatively poor write throughput- as a key, Spanner has evolved from a Bigtable- like- key- value store into a temporal multi- version data is stored in schematized semi- relational is versioned, and each version is automat- amped with its commit time; old versions of p- tions can read data via old timestamps, ports general- purpose transactions, and pro- based query language. ally- distributed database, Spanner provides esting features. First, the replication con- r data can be dynamically controlled at a applications. Applications can specify a- trol which datacenters contain which data, is from its users (to control read latency), as are from each other (to control write la- ow many replicas are maintained (to con- availability, and read performance). Data tnamically and transparently moved be- ters by the system to balance resource us- icenters. Second, Spanner has two features it to implement in a distributed database: it
    ability problems that it brings [9] [10] [19]. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Running two- phase commit over Paxos

    CONCLUSION

    Concurrency control and recovery are among the most important functions provided by a DBMS.
    Concurrency control is automatic
    $\longrightarrow$ System automatically inserts lock/unlock requests and schedules actions of different txns. $\longrightarrow$ Ensures that resulting execution is equivalent to executing the txns one after the other in some order.
    Just like "NoSQL" there was a "who needs transactions" phase. That has passed. $\longrightarrow$ SQL and Transactions are good and necessary!

    NEXT CLASS

    Two-Phase LockingIsolation Levels