GithubDocuments
    1. Marketplace
    2. CMU Database - 15445 - 2025 Spring
    Collections
    CMU Database - 15445 - 2025 Spring
    cmu_database
    This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...
    DocumentsKnowledge Graph
    lecture-13-slides.pdf
    Carnegie Mellon University
    Database Systems Query Execution I

    ADMINISTRIVIA

    Project #3 is due Sunday March 30th @ 11
    Mid- term exam grades now posted. $\longrightarrow$ Exam viewing for the next 3 OH, including today. $\longrightarrow$ The last OH for exam viewing is on March 24.

    UPCOMING DATABASE TALKS

    Google (DB Seminar) $\rightarrow$ Monday Oct 21st @ 4
    $\rightarrow$ https://cmu.zoom.us/j/93441451665

    Google

    See the full DB Seminar schedule at https://db.cs.cmu.edu/seminar2025/

    QUERY EXECUTION

    A query plan is a DAG of operators.
    A pipeline is a sequence of operators where tuples continuously flow between them without intermediate storage.
    A pipeline breaker is an operator that cannot finish until all its children emit all their tuples. $\rightarrow$ Joins (Build Side), Subqueries, Order By
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    QUERY EXECUTION

    A query plan is a DAG of operators.
    A pipeline is a sequence of operators where tuples continuously flow between them without intermediate storage.
    A pipeline breaker is an operator that cannot finish until all its children emit all their tuples. $\rightarrow$ Joins (Build Side), Subqueries, Order By
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    Pipeline #1

    ACCESS TIMES

    Latency Numbers Every Programmer Should Know

    Table (html):
    1 ns L1 Cache Ref
    4 ns L2 Cache Ref
    100 ns DRAM
    16,000 ns SSD
    2,000,000 ns HDD
    ~50,000,000 ns Network Storage
    1,000,000,000 ns Tape Archives

    TODAY'S AGENDA

    Processing ModelsAccess MethodsModification QueriesExpression Evaluation

    PROCESSING MODEL

    A DBMS's processing model defines how the system executes a query plan and moves data from one operator to the next.
    $\rightarrow$ Different trade- offs for workloads (OLTP vs. OLAP).
    Each processing model is comprised of two types of execution paths: $\rightarrow$ Control Flow: How the DBMS invokes an operator. $\rightarrow$ Data Flow: How an operator sends its results.
    The output of an operator can be either whole tuples (NSM) or subsets of columns (DSM).

    PROCESSING MODEL

    Approach #1: Iterator Model
    Approach #2: Materialization Model
    Approach #3: Vectorized / Batch Model

    PROCESSING MODEL

    Approach #1: Iterator Model Most Common
    Approach #2: Materialization Model Rare
    Approach #3: Vectorized / Batch Model Common

    ITERATOR MODEL

    Each query plan operator implements a Next() function.
    $\rightarrow$ On each invocation, the operator returns either a single tuple or a EOF marker if there are no more tuples. $\rightarrow$ The operator implements a loop that calls Next() on its children to retrieve their tuples and then process them.
    Each operator implementation also has Open() and Close() functions.
    $\rightarrow$ Analogous to constructors/destructors, but for operators.
    Also called Volcano or Pipeline Model.

    ITERATOR MODEL

    Control Flow → Data Flow →
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ITERATOR MODEL

    ITERATOR MODEL

    ITERATOR MODEL

    Control Flow Data Flow
    for t in child.Next() emit(projection(t))
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    for t in left.Next() buildHashTable(t) for t in right.Next() if probe(t)emit(t1t2)
    for t in child.Next() if evalPred(t): emit(t)
    for t in R: emit(t)
    for t in S: emit(t)

    ITERATOR MODEL

    Control Flow Data Flow
    1
    for t in child.Next() : emit(projection(t))
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    for t in left.Next() : buildHashTable(t) for t in right.Next() : if probe(t) : emit(t1t2)
    for t in child.Next() : if evalPred(t): emit(t)
    for t in R: emit(t)
    for t in S: emit(t)

    ITERATOR MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ITERATOR MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ITERATOR MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ITERATOR MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ITERATOR MODEL

    Control Flow Data Flow
    1
    for t in child.Next() emit(projection(t))
    2
    for t in left.Next() : buildHashTable(t) for t in right.Next): if probe(t2): emit(t1t2)
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    for t in R: emit(t)
    for t in child.Next() : if evalPred(t): emit(t)
    for t in S: emit(t)

    ITERATOR MODEL

    ITERATOR MODEL

    ITERATOR MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ITERATOR MODEL

    ITERATOR MODEL

    ITERATOR MODEL

    ITERATOR MODEL

    The Iterator model is used in almost every DBMS. $\rightarrow$ Easy to implement / debug. $\rightarrow$ Output control works easily with this approach.
    Allows for pipelining where the DBMS tries to process each tuple through as many operators as possible before retrieving the next tuple.

    MATERIALIZATION MODEL

    Each operator processes its input all at once and then emits its output all at once.
    $\rightarrow$ The operator "materializes" its output as a single result. $\rightarrow$ The DBMS can push down hints (e.g., LIMIT) to avoid scanning too many tuples. $\rightarrow$ Can send either a materialized row or a single column.
    The output can be either whole tuples (NSM) or subsets of columns (DSM). $\rightarrow$ Originally developed by MonetDB in the 1990s to process entire columns at a time instead of single tuples.

    MATERIALIZATION MODEL

    Control Flow Data Flow
    out = [] for t in child.Output(): out.add(projection(t)) return out
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    out = [] for t in left.Output(): buildHashTable(t_1) for t_2 in right.Output(): if probe(t_2): out.add(t_1 * t_2) return out
    out = [] for t in child.Output(): if evalPred(t): out.add(t) return out
    out = [] for t in R: out.add(t) return out
    out = [] for t in S: out.add(t) return out

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    MATERIALIZATION MODEL

    Better for OLTP workloads because queries only access a small number of tuples at a time. $\rightarrow$ Lower execution / coordination overhead. $\rightarrow$ Fewer function calls.
    Not ideal for OLAP queries with large intermediate results because DBMS must allocate buffers.

    VECTORIZATION MODEL

    Like the Iterator Model where each operator implements a Next() function, but...
    Each operator emits a batch of tuples instead of a single tuple.
    $\rightarrow$ The operator's internal loop processes multiple tuples at a time. $\rightarrow$ The size of the batch can vary based on hardware or query properties. $\rightarrow$ Each batch will contain one or more columns each their own null bitmaps.

    VECTORIZATION MODEL

    Control Flow Data Flow
    out = [] for t in child.Next(): out.add(projection(t)) if |out| > n: emit(out)
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    VECTORIZATION MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    VECTORIZATION MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    VECTORIZATION MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    VECTORIZATION MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    VECTORIZATION MODEL

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    VECTORIZATION MODEL

    Ideal for OLAP queries because it greatly reduces the number of invocations per operator.
    Allows an out- of- order CPU to efficiently execute operators over batches of tuples. $\rightarrow$ Operators perform work in tight for- loops over arrays, which compilers know how to optimize / vectorize. $\rightarrow$ No data or control dependencies. $\rightarrow$ Hot instruction cache.

    VECTORIZATION MODEL

    Ideal for OLAP queries because it greatly reduces the number of invocations per operator.
    Allows an out- of- order CPU to efficiently execute operators over batches of tuples. $\rightarrow$ Operators perform work in tight for- loops over arrays, which compilers know how to optimize / vectorize. $\rightarrow$ No data or control dependencies. $\rightarrow$ Hot instruction cache.

    OBSERVATION

    In the previous examples, the DBMS starts executing a query by invoking Next() at the root of the query plan and pulling data up from leaf operators.
    This is the how most DBMSs implement their execution engine.

    PLAN PROCESSING DIRECTION

    Approach #1: Top-to-Bottom (Pull)

    $\rightarrow$ Start with the root and "pull" data up from its children. $\rightarrow$ Tuples are always passed between operators using function calls (unless it's a pipeline breaker).

    Approach #2: Bottom-to-Top (Push)

    $\rightarrow$ Start with leaf nodes and "push" data to their parents. $\rightarrow$ Can "fuse" operators together within a for- loop to minimize intermediate result staging.

    PLAN PROCESSING DIRECTION

    Approach #1: Top-to-Bottom (Pull)

    $\rightarrow$ Start with the root and "pull" data up from its children. $\rightarrow$ Tuples are always passed between operators using function calls (unless it's a pipeline breaker).

    Approach #2: Bottom-to-Top (Push)

    $\rightarrow$ Start with leaf nodes and "push" data to their parents. $\rightarrow$ Can "fuse" operators together within a for- loop to minimize intermediate result staging.

    PUSH-BASED ITERATOR MODEL

    Control Flow → Data Flow →
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    PUSH-BASED ITERATOR MODEL

    Control Flow → Data Flow →
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    Pipeline #1

    PUSH-BASED ITERATOR MODEL

    Control Flow → Data Flow →
    1 for $t_1$ in R: buildHashTable(t_1)
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    PUSH-BASED ITERATOR MODEL

    Control Flow Data Flow

    PUSH-BASED ITERATOR MODEL

    Control Flow → Data Flow →

    Scheduler

    1
    for $t_1$ in R: buildHashTable(t_1)
    2
    for $t_2$ in S: if evalPred(t): if probeHashTable(t_2): emit(projection(t_1, t_2))
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    Pipeline #1

    PUSH-BASED ITERATOR MODEL

    Control Flow Data Flow

    Scheduler

    for $t_1$ in R: buildHashTable(t_1)
    for $t_2$ in S: if evalPred(t): if probeHashTable(t_2): emit(projection(t_1, t_2))
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    Pipeline #1

    PUSH-BASED ITERATOR MODEL

    Control Flow Data Flow

    Scheduler

    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
    Pipeline #1

    PUSH-BASED ITERATOR MODEL

    Control Flow Data Flow

    PLAN PROCESSING DIRECTION

    Approach #1: Top-to-Bottom (Pull)

    $\rightarrow$ Easy to control output via LIMIT.
    $\rightarrow$ Parent operator blocks until its child returns with a tuple. $\rightarrow$ Additional overhead because operators' Next() functions are implemented as virtual functions. $\rightarrow$ Branching costs on each Next() invocation.

    Approach #2: Bottom-to-Top (Push)

    $\rightarrow$ Allows for tighter control of caches/registers in pipelines. $\rightarrow$ May not have exact control of intermediate result sizes. $\rightarrow$ Difficult to implement some operators (Sort- Merge Join).

    PLAN PROCESSING DIRECTION

    Approach #1: Top- to- Bottom (Pull) Most Common
    $\rightarrow$ Easy to control output via LIMIT.
    $\rightarrow$ Parent operator blocks until its child returns with a tuple. $\rightarrow$ Additional overhead because operators' Next() functions are implemented as virtual functions. $\rightarrow$ Branching costs on each Next() invocation.
    Approach #2: Bottom- to- Top (Push) Rare $\rightarrow$ Allows for tighter control of caches/registers in pipelines. $\rightarrow$ May not have exact control of intermediate result sizes. $\rightarrow$ Difficult to implement some operators (Sort- Merge Join).

    ACCESS METHODS

    An access method is the way that the DBMS accesses the data stored in a table. $\longrightarrow$ Not defined in relational algebra.
    Three basic approaches: $\longrightarrow$ Sequential Scan. $\longrightarrow$ Index Scan (many variants). $\longrightarrow$ Multi- Index Scan.
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    ACCESS METHODS

    An access method is the way that the DBMS accesses the data stored in a table. $\longrightarrow$ Not defined in relational algebra.
    Three basic approaches: $\longrightarrow$ Sequential Scan. $\longrightarrow$ Index Scan (many variants). $\longrightarrow$ Multi- Index Scan.
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

    SEQUENTIAL SCAN

    For each page in the table:
    $\rightarrow$ Retrieve it from the buffer pool. $\rightarrow$ Iterate over each tuple and check whether to include it.
    The DBMS maintains an internal cursor that tracks the last page / slot it examined.
    for page in table.pages: for t in page.tuples: if evalPred(t): // Do Something!

    SEQUENTIAL SCAN: OPTIMIZATIONS

    Data Encoding / CompressionPrefetching / Scan Sharing / Buffer BypassTask Parallelization / Multi- threadingClustering / SortingLate MaterializationMaterialized Views / Result CachingData SkippingData Parallelization / VectorizationCode Specialization / Compilation

    SEQUENTIAL SCAN: OPTIMIZATIONS

    Lecture #5 Data Encoding / Compression Lecture #06 Prefetching / Scan Sharing / Buffer Bypass Lecture #14 Task Parallelization / Multi- threading Lecture #08 Clustering / Sorting Lecture #12 Late Materialization Materialized Views / Result Caching Data Skipping Lecture #14 Data Parallelization / Vectorization - Code Specialization / Compilation

    SEQUENTIAL SCAN: OPTIMIZATIONS

    Lecture #5 Data Encoding / Compression Lecture #06 Prefetching / Scan Sharing / Buffer Bypass Lecture #14 Task Parallelization / Multi- threading Lecture #08 Clustering / Sorting Lecture #12 Late Materialization Materialized Views / Result Caching Data Skipping Lecture #14 Data Parallelization / Vectorization - Code Specialization / Compilation

    DATA SKIPPING

    Approach #1: Approximate Queries (Lossy)

    $\rightarrow$ Execute queries on a sampled subset of the entire table to produce approximate results. $\rightarrow$ Examples: BlinkDB, Redshift, ComputeDB, XDB, Oracle, Snowflake, Google BigQuery, DataBricks

    Approach #2: Zone Maps (Lossless)

    $\rightarrow$ Pre- compute columnar aggregations per page that allow the DBMS to check whether queries need to access it. $\rightarrow$ Trade- off between page size vs. filter efficacy. $\rightarrow$ Examples: Oracle, Vertica, SingleStore, Netezza, Snowflake, Google BigQuery

    ZONE MAPS

    Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

    Original Data

    Table (html):
    val
    100
    200
    300
    400
    400

    ZONE MAPS

    Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

    ZONE MAPS

    Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
    SELECT * FROM table WHERE val > 600

    Original Data

    ZONE MAPS

    Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
    SELECT * FROM table WHERE val > 600

    Original Data

    ZONE MAPS

    Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
    SELECT * FROM table WHERE val > 600

    Original Data

    Parquet

    ZONE MA

    Pre- computed aggregates for a page. DBMS checks the zone whether it wants to access the
    SELECT * FROM table WHERE val > 600

    Original Data

    Table (html):
    val
    100
    200
    300
    400
    400

    Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing

    Guido Moerkotte moer@pi3. informatik.uni- mannheim.de Lehrstuhl fur praktische Informatik III, Universitat Mannheim, Germany

    Abstract

    Small Materialized Aggregates (SMAs for short) are considered a highly flexible and versatile alternative for materialized data cubes. The basic idea is to compute many aggregate values for small to medium- sized buckets of tuples. These aggregates are then used to speed up query processing. We present the general idea and present an application of SMAs to the TPC- D benchmark. We show that ex- ploiting SMAs for TPC- D Query 1 results in a speed up of two orders of magnitude. Then, we investigate the problem of query processing in the presence of SMAs. Last, we briefly discuss some further tuning possibilities for SMAs.

    1 Introduction

    Among the predominant demands put on data ware- house management systems (DWMSs) is performance, i.e., the highly efficient evaluation of complex analytical queries. A very successful means to speed up query processing is the exploitation of index structures. Several index structures have been applied to data warehouse management systems (for an overview see [2, 17]). Among them are traditional index structures [1, 3, 6], bitmaps [15], and R- tree- like structures [9].
    Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed in direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998
    Since most of the queries against data warehouses incorporate grouping and aggregation, it seems to be a good idea to materialize according views. The most popular of these approaches is the materialized data cube where for a set of dimensions, for all their possible grouping combinations, the aggregates of interest are materialized. Then, query processing against a data cube boils down to a very efficient lookup. Since the complete data cube is very space consuming [5, 18], strategies have been developed for materializing only those parts of a data cube that pay off most in query processing [10]. Another approach- based on [14] is to hierarchically organize the aggregates [12]. But still the storage consumption can be very high, even for a simple grouping possibility, if the number of dimensions and/or their cardinality grows. On the user side, the data cube operator has been proposed to allow for easier query execution [8]. But since we deal with performance limits, we will throughout the rest of the paper use the term data cube to refer to a materialized data cube used to speed up query processing.
    Besides high storage consumption, the biggest disadvantage of the data cube is its inflexibility. Each data cube implies a fixed number of queries that can be answered with it. As soon as for example an additional selection condition occurs in the query, the data cube might not be applicable any more. Further- . more, for queries not foreseen by the data cube de- . signer, the data cube is useless. This argument applies also to alternative structures like the one presented in [12]. This inflexibility- - together with the extordi- . nary space consumption- - maybe the reason why, to the knowledge of the author, data cubes have never been applied to the standard data warehouse benchmark TPC- D [19]. (cf. Section 2.4 for space requirements of a data cube applied to TPC- D data) Our goal was to design an index structure that allows for effi- . cient support of complex queries against high volumes of data as exemplified by the TPC- D benchmark.
    The main problem encountered is that some queries

    INDEX SCAN

    The DBMS picks an index to find the tuples that the query needs.
    Which index to use depends on:
    $\rightarrow$ What attributes the index contains $\rightarrow$ What attributes the query references $\rightarrow$ The attribute's value domains $\rightarrow$ Predicate composition $\rightarrow$ Whether the index has unique or non- unique keys

    INDEX SCAN

    The DBMS picks an index to find the tuples that the query needs.

    Lecture #15

    Which index to use depends on: $\rightarrow$ What attributes the index contains $\rightarrow$ What attributes the query references $\rightarrow$ The attribute's value domains $\rightarrow$ Predicate composition $\rightarrow$ Whether the index has unique or non- unique keys

    INDEX SCAN

    Suppose that we have a single table with 100 tuples and two indexes:
    $\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

    INDEX SCAN

    Suppose that we have a single table with 100 tuples and two indexes:
    $\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
    Scenario #1
    There are 99 people under the age of 30 but only 2 people in the CS department.

    INDEX SCAN

    Suppose that we have a single table with 100 tuples and two indexes:
    $\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

    Scenario #1

    There are 99 people under the age of 30 but only 2 people in the CS department.

    Scenario #2

    There are 99 people in the CS department but only 2 people under the age of 30.

    MULTI-INDEX SCAN

    If there are multiple indexes available for a query, the DBMS does not have to pick only one:
    $\longrightarrow$ Compute sets of Record IDs using each matching index. $\longrightarrow$ Combine these sets based on the query's predicates (union vs. intersect). $\longrightarrow$ Retrieve the records and apply any remaining predicates.
    Examples:
    $\longrightarrow$ DB2 Multi- Index Scan $\longrightarrow$ PostgreSQL Bitmap Scan $\longrightarrow$ MySQL Index Merge

    MULTI-INDEX SCAN

    Given the following query on a database with an index #1 on age and an index #2 on dept:
    $\longrightarrow$ We can retrieve the Record IDs satisfying age<30 using index #1. $\longrightarrow$ Then retrieve the Record IDs satisfying dept='cs' using index #2. $\longrightarrow$ Take their intersection. $\longrightarrow$ Retrieve records and check country='us'.
    SELECT * FROM students WHERE age < 30 AND dept = 'cs' AND country = 'us'

    MULTI-INDEX SCAN

    Set intersection can be done efficiently with bitmaps or hash tables.
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

    MULTI-INDEX SCAN

    Set intersection can be done efficiently with bitmaps or hash tables.
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

    MULTI-INDEX SCAN

    Set intersection can be done efficiently with bitmaps or hash tables.
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

    MULTI-INDEX SCAN

    Set intersection can be done efficiently with bitmaps or hash tables.
    SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

    MODIFICATION QUERIES

    Operators that modify the database (INSERT, UPDATE, DELETE) are responsible for modifying the target table and its indexes.
    $\rightarrow$ Constraint checks can either happen immediately inside of operator or deferred until later in query/transaction.
    The output of these operators can either be Record Ids or tuple data (i.e., RETURNING).

    MODIFICATION QUERIES

    UPDATE/DELETE:

    $\rightarrow$ Child operators pass Record IDs for target tuples. $\rightarrow$ Must keep track of previously seen tuples.

    INSERT:

    $\rightarrow$ Choice #1: Materialize tuples inside of the operator. $\rightarrow$ Choice #2: Operator inserts any tuple passed in from child operators.

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    for t in child.Next(): removeFromIndex(idx_salary, t.salary, t) updateTuple(t.salary = t.salary + 100) insertIntoIndex(idx_salary, t.salary, t)
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    for t in Indexpeople: if t.salary < 1100: emit(t)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    UPDATE QUERY PROBLEM

    UPDATE QUERY PROBLEM

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow → Data Flow →
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    UPDATE QUERY PROBLEM

    Control Flow Data Flow
    CREATE INDEX idx_salary ON people (salary);
    UPDATE people SET salary = salary + 100 WHERE salary < 1100
    Index(people.salary)

    HALLOWEEN PROBLEM

    Anomaly where an update operation changes the physical location of a tuple, which causes a scan operator to visit the tuple multiple times. $\rightarrow$ Can occur on clustered tables or index scans.
    First discovered by IBM researchers while working on System R on Halloween day in 1976.
    Solution: Track modified record ids per query.

    EXPRESSION EVALUATION

    The DBMS represents a WHERE clause as an expression tree.
    The nodes in the tree represent different expression types:
    $\longrightarrow$ Comparisons $(= , < , > , ! =)$ $\longrightarrow$ Conjunction (AND), Disjunction (OR) $\longrightarrow$ Arithmetic Operators (+, - , *, /, %) $\longrightarrow$ Constant Values $\longrightarrow$ Tuple Attribute References $\longrightarrow$ Functions
    SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100;

    EXPRESSION EVALUATION

    The DBMS represents a WHERE clause as an expression tree.
    The nodes in the tree represent different expression types:
    $\longrightarrow$ Comparisons $(= , < , > , ! =)$ $\longrightarrow$ Conjunction (AND), Disjunction (OR) $\longrightarrow$ Arithmetic Operators $(+, - ,*, / ,%)$ $\longrightarrow$ Constant Values $\longrightarrow$ Tuple Attribute References $\longrightarrow$ Functions

    EXPRESSION EVALUATION

    PREPARE XXX AS SELECT * FROM S WHERE S.val = $1 + 9 EXECUTE XXX(991)

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXECUTE XXX(991)

    Execution Context

    EXPRESSION EVALUATION

    EXECUTE XXX(991)

    Execution Context

    EXPRESSION EVALUATION

    EXECUTE XXX(991)

    Execution Context

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXPRESSION EVALUATION

    EXECUTE XXX(991)

    Execution Context

    EXPRESSION EVALUATION

    Evaluating predicates by traversing a tree is terrible for the CPU.
    $\rightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
    SELECT * WHERE s.val = 1;

    EXPRESSION EVALUATION

    Evaluating predicates by traversing a tree is terrible for the CPU.
    $\longrightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
    A better approach is to evaluate the expression directly.
    An even better approach is to vectorize it evaluate a batch of tuples at the same time...
    SELECT * WHERE s.val = 1;

    EXPRESSION EVALUATION

    Evaluating predicates by traversing a tree is terrible for the CPU.
    $\rightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
    A better approach is to evaluate the expression directly.
    An even better approach is to vectorize it evaluate a batch of tuples at the same time...
    SELECT * WHERE s.val = 1;

    EXPRESSION EVALUATION

    Evaluating predicates by traversing a tree is terrible for the CPU.
    $\longrightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
    A better approach is to evaluate the expression directly.
    An even better approach is to vectorize it evaluate a batch of tuples at the same time...
    SELECT * WHERE s.val = 1;

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
    WHERE UPPER(col1) = UPPER('wutang');

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    $\longrightarrow$ Identify redundant / unnecessary operations that are wasteful. $\longrightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
    WHERE UPPER(col1) = UPPER('wutang');

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
    WHERE UPPER(col1) = UPPER('wutang');

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
    WHERE UPPER(col1) = UPPER('wutang');

    Common Sub-Expr. Elimination:

    Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

    Common Sub-Expr. Elimination:

    $\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
    WHERE UPPER(col1) = UPPER('wutang');
    WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

    Common Sub-Expr. Elimination:

    $\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
    WHERE UPPER(col1) = UPPER('wutang');
    WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

    Common Sub-Expr. Elimination:

    $\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
    WHERE UPPER(col1) = UPPER('wutang');
    WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

    EXPRESSION EVALUATION: OPTIMIZATIONS

    Constant Folding:

    Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

    Common Sub-Expr. Elimination:

    $\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
    WHERE UPPER(col1) = UPPER('wutang');
    WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

    CONCLUSION

    The same query plan can be executed in multiple different ways.
    (Most) DBMSs will want to use index scans as much as possible.
    Expression trees are flexible but slow. JIT compilation can (sometimes) speed them up.

    NEXT CLASS

    Parallel Query Execution