CMU Database - 15445 - 2025 Spring

cmu_database

This collection of documents, "CMU Database - 15445 - 2025 Spring," provides a comprehensive overview of database systems, primarily focusing on the design, implementation, and ...

Documents Knowledge Graph

lecture-13-slides.pdf

Carnegie Mellon University

Database Systems Query Execution I

ADMINISTRIVIA

Project #3 is due Sunday March 30th @ 11

Mid- term exam grades now posted. $\longrightarrow$ Exam viewing for the next 3 OH, including today. $\longrightarrow$ The last OH for exam viewing is on March 24.

UPCOMING DATABASE TALKS

Google (DB Seminar) $\rightarrow$ Monday Oct 21^st @ 4

$\rightarrow$ https://cmu.zoom.us/j/93441451665

Google

See the full DB Seminar schedule at https://db.cs.cmu.edu/seminar2025/

QUERY EXECUTION

A query plan is a DAG of operators.

A pipeline is a sequence of operators where tuples continuously flow between them without intermediate storage.

A pipeline breaker is an operator that cannot finish until all its children emit all their tuples. $\rightarrow$ Joins (Build Side), Subqueries, Order By

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

QUERY EXECUTION

A query plan is a DAG of operators.

A pipeline is a sequence of operators where tuples continuously flow between them without intermediate storage.

A pipeline breaker is an operator that cannot finish until all its children emit all their tuples. $\rightarrow$ Joins (Build Side), Subqueries, Order By

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

Pipeline #1

ACCESS TIMES

Latency Numbers Every Programmer Should Know

Table (html):

1 ns L1 Cache Ref

4 ns L2 Cache Ref

100 ns DRAM

16,000 ns SSD

2,000,000 ns HDD

~50,000,000 ns Network Storage

1,000,000,000 ns Tape Archives

TODAY'S AGENDA

Processing ModelsAccess MethodsModification QueriesExpression Evaluation

PROCESSING MODEL

A DBMS's processing model defines how the system executes a query plan and moves data from one operator to the next.

$\rightarrow$ Different trade- offs for workloads (OLTP vs. OLAP).

Each processing model is comprised of two types of execution paths: $\rightarrow$ Control Flow: How the DBMS invokes an operator. $\rightarrow$ Data Flow: How an operator sends its results.

The output of an operator can be either whole tuples (NSM) or subsets of columns (DSM).

PROCESSING MODEL

Approach #1: Iterator Model

Approach #2: Materialization Model

Approach #3: Vectorized / Batch Model

PROCESSING MODEL

Approach #1: Iterator Model Most Common

Approach #2: Materialization Model Rare

Approach #3: Vectorized / Batch Model Common

ITERATOR MODEL

Each query plan operator implements a Next() function.

$\rightarrow$ On each invocation, the operator returns either a single tuple or a EOF marker if there are no more tuples. $\rightarrow$ The operator implements a loop that calls Next() on its children to retrieve their tuples and then process them.

Each operator implementation also has Open() and Close() functions.

$\rightarrow$ Analogous to constructors/destructors, but for operators.

Also called Volcano or Pipeline Model.

ITERATOR MODEL

Control Flow → Data Flow →

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ITERATOR MODEL

Control Flow Data Flow

for t in child.Next() emit(projection(t))

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

for t in left.Next() buildHashTable(t) for t in right.Next() if probe(t)emit(t1t2)

for t in child.Next() if evalPred(t): emit(t)

for t in R: emit(t)

for t in S: emit(t)

ITERATOR MODEL

Control Flow Data Flow

for t in child.Next() : emit(projection(t))

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

for t in left.Next() : buildHashTable(t) for t in right.Next() : if probe(t) : emit(t1t2)

for t in child.Next() : if evalPred(t): emit(t)

for t in R: emit(t)

for t in S: emit(t)

ITERATOR MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ITERATOR MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ITERATOR MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ITERATOR MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ITERATOR MODEL

Control Flow Data Flow

for t in child.Next() emit(projection(t))

for t in left.Next() : buildHashTable(t) for t in right.Next): if probe(t2): emit(t1t2)

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

for t in R: emit(t)

for t in child.Next() : if evalPred(t): emit(t)

for t in S: emit(t)

ITERATOR MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ITERATOR MODEL

The Iterator model is used in almost every DBMS. $\rightarrow$ Easy to implement / debug. $\rightarrow$ Output control works easily with this approach.

Allows for pipelining where the DBMS tries to process each tuple through as many operators as possible before retrieving the next tuple.

MATERIALIZATION MODEL

Each operator processes its input all at once and then emits its output all at once.

$\rightarrow$ The operator "materializes" its output as a single result. $\rightarrow$ The DBMS can push down hints (e.g., LIMIT) to avoid scanning too many tuples. $\rightarrow$ Can send either a materialized row or a single column.

The output can be either whole tuples (NSM) or subsets of columns (DSM). $\rightarrow$ Originally developed by MonetDB in the 1990s to process entire columns at a time instead of single tuples.

MATERIALIZATION MODEL

Control Flow Data Flow

out = [] for t in child.Output(): out.add(projection(t)) return out

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

out = [] for t in left.Output(): buildHashTable(t_1) for t_2 in right.Output(): if probe(t_2): out.add(t_1 * t_2) return out

out = [] for t in child.Output(): if evalPred(t): out.add(t) return out

out = [] for t in R: out.add(t) return out

out = [] for t in S: out.add(t) return out

MATERIALIZATION MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

MATERIALIZATION MODEL

Better for OLTP workloads because queries only access a small number of tuples at a time. $\rightarrow$ Lower execution / coordination overhead. $\rightarrow$ Fewer function calls.

Not ideal for OLAP queries with large intermediate results because DBMS must allocate buffers.

VECTORIZATION MODEL

Like the Iterator Model where each operator implements a Next() function, but...

Each operator emits a batch of tuples instead of a single tuple.

$\rightarrow$ The operator's internal loop processes multiple tuples at a time. $\rightarrow$ The size of the batch can vary based on hardware or query properties. $\rightarrow$ Each batch will contain one or more columns each their own null bitmaps.

VECTORIZATION MODEL

Control Flow Data Flow

out = [] for t in child.Next(): out.add(projection(t)) if |out| > n: emit(out)

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

VECTORIZATION MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

VECTORIZATION MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

VECTORIZATION MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

VECTORIZATION MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

VECTORIZATION MODEL

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

VECTORIZATION MODEL

Ideal for OLAP queries because it greatly reduces the number of invocations per operator.

Allows an out- of- order CPU to efficiently execute operators over batches of tuples. $\rightarrow$ Operators perform work in tight for- loops over arrays, which compilers know how to optimize / vectorize. $\rightarrow$ No data or control dependencies. $\rightarrow$ Hot instruction cache.

VECTORIZATION MODEL

Ideal for OLAP queries because it greatly reduces the number of invocations per operator.

OBSERVATION

In the previous examples, the DBMS starts executing a query by invoking Next() at the root of the query plan and pulling data up from leaf operators.

This is the how most DBMSs implement their execution engine.

PLAN PROCESSING DIRECTION

Approach #1: Top-to-Bottom (Pull)

$\rightarrow$ Start with the root and "pull" data up from its children. $\rightarrow$ Tuples are always passed between operators using function calls (unless it's a pipeline breaker).

Approach #2: Bottom-to-Top (Push)

$\rightarrow$ Start with leaf nodes and "push" data to their parents. $\rightarrow$ Can "fuse" operators together within a for- loop to minimize intermediate result staging.

PLAN PROCESSING DIRECTION

Approach #1: Top-to-Bottom (Pull)

$\rightarrow$ Start with the root and "pull" data up from its children. $\rightarrow$ Tuples are always passed between operators using function calls (unless it's a pipeline breaker).

Approach #2: Bottom-to-Top (Push)

$\rightarrow$ Start with leaf nodes and "push" data to their parents. $\rightarrow$ Can "fuse" operators together within a for- loop to minimize intermediate result staging.

PUSH-BASED ITERATOR MODEL

Control Flow → Data Flow →

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

PUSH-BASED ITERATOR MODEL

Control Flow → Data Flow →

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

Pipeline #1

PUSH-BASED ITERATOR MODEL

Control Flow → Data Flow →

1 for $t_1$ in R: buildHashTable(t_1)

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

PUSH-BASED ITERATOR MODEL

Control Flow Data Flow

PUSH-BASED ITERATOR MODEL

Control Flow → Data Flow →

Scheduler

for $t_1$ in R: buildHashTable(t_1)

for $t_2$ in S: if evalPred(t): if probeHashTable(t_2): emit(projection(t_1, t_2))

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

Pipeline #1

PUSH-BASED ITERATOR MODEL

Control Flow Data Flow

Scheduler

for $t_1$ in R: buildHashTable(t_1)

for $t_2$ in S: if evalPred(t): if probeHashTable(t_2): emit(projection(t_1, t_2))

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

Pipeline #1

PUSH-BASED ITERATOR MODEL

Control Flow Data Flow

Scheduler

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

Pipeline #1

PUSH-BASED ITERATOR MODEL

Control Flow Data Flow

PLAN PROCESSING DIRECTION

Approach #1: Top-to-Bottom (Pull)

$\rightarrow$ Easy to control output via LIMIT.

$\rightarrow$ Parent operator blocks until its child returns with a tuple. $\rightarrow$ Additional overhead because operators' Next() functions are implemented as virtual functions. $\rightarrow$ Branching costs on each Next() invocation.

Approach #2: Bottom-to-Top (Push)

$\rightarrow$ Allows for tighter control of caches/registers in pipelines. $\rightarrow$ May not have exact control of intermediate result sizes. $\rightarrow$ Difficult to implement some operators (Sort- Merge Join).

PLAN PROCESSING DIRECTION

Approach #1: Top- to- Bottom (Pull) Most Common

$\rightarrow$ Easy to control output via LIMIT.

Approach #2: Bottom- to- Top (Push) Rare $\rightarrow$ Allows for tighter control of caches/registers in pipelines. $\rightarrow$ May not have exact control of intermediate result sizes. $\rightarrow$ Difficult to implement some operators (Sort- Merge Join).

ACCESS METHODS

An access method is the way that the DBMS accesses the data stored in a table. $\longrightarrow$ Not defined in relational algebra.

Three basic approaches: $\longrightarrow$ Sequential Scan. $\longrightarrow$ Index Scan (many variants). $\longrightarrow$ Multi- Index Scan.

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

ACCESS METHODS

An access method is the way that the DBMS accesses the data stored in a table. $\longrightarrow$ Not defined in relational algebra.

Three basic approaches: $\longrightarrow$ Sequential Scan. $\longrightarrow$ Index Scan (many variants). $\longrightarrow$ Multi- Index Scan.

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100

SEQUENTIAL SCAN

For each page in the table:

$\rightarrow$ Retrieve it from the buffer pool. $\rightarrow$ Iterate over each tuple and check whether to include it.

The DBMS maintains an internal cursor that tracks the last page / slot it examined.

for page in table.pages: for t in page.tuples: if evalPred(t): // Do Something!

SEQUENTIAL SCAN: OPTIMIZATIONS

Data Encoding / CompressionPrefetching / Scan Sharing / Buffer BypassTask Parallelization / Multi- threadingClustering / SortingLate MaterializationMaterialized Views / Result CachingData SkippingData Parallelization / VectorizationCode Specialization / Compilation

SEQUENTIAL SCAN: OPTIMIZATIONS

Lecture #5 Data Encoding / Compression Lecture #06 Prefetching / Scan Sharing / Buffer Bypass Lecture #14 Task Parallelization / Multi- threading Lecture #08 Clustering / Sorting Lecture #12 Late Materialization Materialized Views / Result Caching Data Skipping Lecture #14 Data Parallelization / Vectorization - Code Specialization / Compilation

SEQUENTIAL SCAN: OPTIMIZATIONS

DATA SKIPPING

Approach #1: Approximate Queries (Lossy)

$\rightarrow$ Execute queries on a sampled subset of the entire table to produce approximate results. $\rightarrow$ Examples: BlinkDB, Redshift, ComputeDB, XDB, Oracle, Snowflake, Google BigQuery, DataBricks

Approach #2: Zone Maps (Lossless)

$\rightarrow$ Pre- compute columnar aggregations per page that allow the DBMS to check whether queries need to access it. $\rightarrow$ Trade- off between page size vs. filter efficacy. $\rightarrow$ Examples: Oracle, Vertica, SingleStore, Netezza, Snowflake, Google BigQuery

ZONE MAPS

Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

Original Data

Table (html):

val

100

200

300

400

ZONE MAPS

Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

ZONE MAPS

Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

SELECT * FROM table WHERE val > 600

Original Data

ZONE MAPS

Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

SELECT * FROM table WHERE val > 600

Original Data

ZONE MAPS

Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.

SELECT * FROM table WHERE val > 600

Original Data

Parquet

ZONE MA

Pre- computed aggregates for a page. DBMS checks the zone whether it wants to access the

SELECT * FROM table WHERE val > 600

Original Data

Table (html):

val

100

200

300

400

Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing

Guido Moerkotte moer@pi3. informatik.uni- mannheim.de Lehrstuhl fur praktische Informatik III, Universitat Mannheim, Germany

Abstract

Small Materialized Aggregates (SMAs for short) are considered a highly flexible and versatile alternative for materialized data cubes. The basic idea is to compute many aggregate values for small to medium- sized buckets of tuples. These aggregates are then used to speed up query processing. We present the general idea and present an application of SMAs to the TPC- D benchmark. We show that ex- ploiting SMAs for TPC- D Query 1 results in a speed up of two orders of magnitude. Then, we investigate the problem of query processing in the presence of SMAs. Last, we briefly discuss some further tuning possibilities for SMAs.

1 Introduction

Among the predominant demands put on data ware- house management systems (DWMSs) is performance, i.e., the highly efficient evaluation of complex analytical queries. A very successful means to speed up query processing is the exploitation of index structures. Several index structures have been applied to data warehouse management systems (for an overview see [2, 17]). Among them are traditional index structures [1, 3, 6], bitmaps [15], and R- tree- like structures [9].

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed in direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998

Since most of the queries against data warehouses incorporate grouping and aggregation, it seems to be a good idea to materialize according views. The most popular of these approaches is the materialized data cube where for a set of dimensions, for all their possible grouping combinations, the aggregates of interest are materialized. Then, query processing against a data cube boils down to a very efficient lookup. Since the complete data cube is very space consuming [5, 18], strategies have been developed for materializing only those parts of a data cube that pay off most in query processing [10]. Another approach- based on [14] is to hierarchically organize the aggregates [12]. But still the storage consumption can be very high, even for a simple grouping possibility, if the number of dimensions and/or their cardinality grows. On the user side, the data cube operator has been proposed to allow for easier query execution [8]. But since we deal with performance limits, we will throughout the rest of the paper use the term data cube to refer to a materialized data cube used to speed up query processing.

Besides high storage consumption, the biggest disadvantage of the data cube is its inflexibility. Each data cube implies a fixed number of queries that can be answered with it. As soon as for example an additional selection condition occurs in the query, the data cube might not be applicable any more. Further- . more, for queries not foreseen by the data cube de- . signer, the data cube is useless. This argument applies also to alternative structures like the one presented in [12]. This inflexibility- - together with the extordi- . nary space consumption- - maybe the reason why, to the knowledge of the author, data cubes have never been applied to the standard data warehouse benchmark TPC- D [19]. (cf. Section 2.4 for space requirements of a data cube applied to TPC- D data) Our goal was to design an index structure that allows for effi- . cient support of complex queries against high volumes of data as exemplified by the TPC- D benchmark.

The main problem encountered is that some queries

INDEX SCAN

The DBMS picks an index to find the tuples that the query needs.

Which index to use depends on:

$\rightarrow$ What attributes the index contains $\rightarrow$ What attributes the query references $\rightarrow$ The attribute's value domains $\rightarrow$ Predicate composition $\rightarrow$ Whether the index has unique or non- unique keys

INDEX SCAN

The DBMS picks an index to find the tuples that the query needs.

Lecture #15

Which index to use depends on: $\rightarrow$ What attributes the index contains $\rightarrow$ What attributes the query references $\rightarrow$ The attribute's value domains $\rightarrow$ Predicate composition $\rightarrow$ Whether the index has unique or non- unique keys

INDEX SCAN

Suppose that we have a single table with 100 tuples and two indexes:

$\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

INDEX SCAN

Suppose that we have a single table with 100 tuples and two indexes:

$\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

Scenario #1

There are 99 people under the age of 30 but only 2 people in the CS department.

INDEX SCAN

Suppose that we have a single table with 100 tuples and two indexes:

$\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

Scenario #1

There are 99 people under the age of 30 but only 2 people in the CS department.

Scenario #2

There are 99 people in the CS department but only 2 people under the age of 30.

MULTI-INDEX SCAN

If there are multiple indexes available for a query, the DBMS does not have to pick only one:

$\longrightarrow$ Compute sets of Record IDs using each matching index. $\longrightarrow$ Combine these sets based on the query's predicates (union vs. intersect). $\longrightarrow$ Retrieve the records and apply any remaining predicates.

Examples:

$\longrightarrow$ DB2 Multi- Index Scan $\longrightarrow$ PostgreSQL Bitmap Scan $\longrightarrow$ MySQL Index Merge

MULTI-INDEX SCAN

Given the following query on a database with an index #1 on age and an index #2 on dept:

$\longrightarrow$ We can retrieve the Record IDs satisfying age<30 using index #1. $\longrightarrow$ Then retrieve the Record IDs satisfying dept='cs' using index #2. $\longrightarrow$ Take their intersection. $\longrightarrow$ Retrieve records and check country='us'.

SELECT * FROM students WHERE age < 30 AND dept = 'cs' AND country = 'us'

MULTI-INDEX SCAN

Set intersection can be done efficiently with bitmaps or hash tables.

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

MULTI-INDEX SCAN

Set intersection can be done efficiently with bitmaps or hash tables.

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

MULTI-INDEX SCAN

Set intersection can be done efficiently with bitmaps or hash tables.

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

MULTI-INDEX SCAN

Set intersection can be done efficiently with bitmaps or hash tables.

SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'

MODIFICATION QUERIES

Operators that modify the database (INSERT, UPDATE, DELETE) are responsible for modifying the target table and its indexes.

$\rightarrow$ Constraint checks can either happen immediately inside of operator or deferred until later in query/transaction.

The output of these operators can either be Record Ids or tuple data (i.e., RETURNING).

MODIFICATION QUERIES

UPDATE/DELETE:

$\rightarrow$ Child operators pass Record IDs for target tuples. $\rightarrow$ Must keep track of previously seen tuples.

INSERT:

$\rightarrow$ Choice #1: Materialize tuples inside of the operator. $\rightarrow$ Choice #2: Operator inserts any tuple passed in from child operators.

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

for t in child.Next(): removeFromIndex(idx_salary, t.salary, t) updateTuple(t.salary = t.salary + 100) insertIntoIndex(idx_salary, t.salary, t)

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

for t in Indexpeople: if t.salary < 1100: emit(t)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow → Data Flow →

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

UPDATE QUERY PROBLEM

Control Flow Data Flow

CREATE INDEX idx_salary ON people (salary);

UPDATE people SET salary = salary + 100 WHERE salary < 1100

Index(people.salary)

HALLOWEEN PROBLEM

Anomaly where an update operation changes the physical location of a tuple, which causes a scan operator to visit the tuple multiple times. $\rightarrow$ Can occur on clustered tables or index scans.

First discovered by IBM researchers while working on System R on Halloween day in 1976.

Solution: Track modified record ids per query.

EXPRESSION EVALUATION

The DBMS represents a WHERE clause as an expression tree.

The nodes in the tree represent different expression types:

$\longrightarrow$ Comparisons $(= , < , > , ! =)$ $\longrightarrow$ Conjunction (AND), Disjunction (OR) $\longrightarrow$ Arithmetic Operators (+, - , *, /, %) $\longrightarrow$ Constant Values $\longrightarrow$ Tuple Attribute References $\longrightarrow$ Functions

SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100;

EXPRESSION EVALUATION

The DBMS represents a WHERE clause as an expression tree.

The nodes in the tree represent different expression types:

$\longrightarrow$ Comparisons $(= , < , > , ! =)$ $\longrightarrow$ Conjunction (AND), Disjunction (OR) $\longrightarrow$ Arithmetic Operators $(+, - ,*, / ,%)$ $\longrightarrow$ Constant Values $\longrightarrow$ Tuple Attribute References $\longrightarrow$ Functions

EXPRESSION EVALUATION

PREPARE XXX AS SELECT * FROM S WHERE S.val = $1 + 9 EXECUTE XXX(991)

EXPRESSION EVALUATION

EXECUTE XXX(991)

Execution Context

EXPRESSION EVALUATION

EXECUTE XXX(991)

Execution Context

EXPRESSION EVALUATION

EXECUTE XXX(991)

Execution Context

EXPRESSION EVALUATION

EXECUTE XXX(991)

Execution Context

EXPRESSION EVALUATION

Evaluating predicates by traversing a tree is terrible for the CPU.

$\rightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.

SELECT * WHERE s.val = 1;

EXPRESSION EVALUATION

Evaluating predicates by traversing a tree is terrible for the CPU.

$\longrightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.

A better approach is to evaluate the expression directly.

An even better approach is to vectorize it evaluate a batch of tuples at the same time...

SELECT * WHERE s.val = 1;

EXPRESSION EVALUATION

Evaluating predicates by traversing a tree is terrible for the CPU.

$\rightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.

A better approach is to evaluate the expression directly.

An even better approach is to vectorize it evaluate a batch of tuples at the same time...

SELECT * WHERE s.val = 1;

EXPRESSION EVALUATION

Evaluating predicates by traversing a tree is terrible for the CPU.

$\longrightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.

A better approach is to evaluate the expression directly.

An even better approach is to vectorize it evaluate a batch of tuples at the same time...

SELECT * WHERE s.val = 1;

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

WHERE UPPER(col1) = UPPER('wutang');

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

$\longrightarrow$ Identify redundant / unnecessary operations that are wasteful. $\longrightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

WHERE UPPER(col1) = UPPER('wutang');

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

WHERE UPPER(col1) = UPPER('wutang');

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

WHERE UPPER(col1) = UPPER('wutang');

Common Sub-Expr. Elimination:

Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

Common Sub-Expr. Elimination:

$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.

WHERE UPPER(col1) = UPPER('wutang');

WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

Common Sub-Expr. Elimination:

$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.

WHERE UPPER(col1) = UPPER('wutang');

WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

Common Sub-Expr. Elimination:

$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.

WHERE UPPER(col1) = UPPER('wutang');

WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

EXPRESSION EVALUATION: OPTIMIZATIONS

Constant Folding:

Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.

Common Sub-Expr. Elimination:

$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.

WHERE UPPER(col1) = UPPER('wutang');

WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8

CONCLUSION

The same query plan can be executed in multiple different ways.

(Most) DBMSs will want to use index scans as much as possible.

Expression trees are flexible but slow. JIT compilation can (sometimes) speed them up.