Carnegie Mellon University
Database Systems Query Execution I
ADMINISTRIVIA
Project #3 is due Sunday March 30th @ 11
Mid- term exam grades now posted. $\longrightarrow$ Exam viewing for the next 3 OH, including today. $\longrightarrow$ The last OH for exam viewing is on March 24.
UPCOMING DATABASE TALKS
Google (DB Seminar) $\rightarrow$ Monday Oct 21st @ 4
$\rightarrow$
https://cmu.zoom.us/j/93441451665
Google
QUERY EXECUTION
A query plan is a DAG of operators.
A pipeline is a sequence of operators where tuples continuously flow between them without intermediate storage.
A pipeline breaker is an operator that cannot finish until all its children emit all their tuples. $\rightarrow$ Joins (Build Side), Subqueries, Order By
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
QUERY EXECUTION
A query plan is a DAG of operators.
A pipeline is a sequence of operators where tuples continuously flow between them without intermediate storage.
A pipeline breaker is an operator that cannot finish until all its children emit all their tuples. $\rightarrow$ Joins (Build Side), Subqueries, Order By
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
Pipeline #1
ACCESS TIMES
Latency Numbers Every Programmer Should Know
Table (html):
| 1 ns L1 Cache Ref |
| 4 ns L2 Cache Ref |
| 100 ns DRAM |
| 16,000 ns SSD |
| 2,000,000 ns HDD |
| ~50,000,000 ns Network Storage |
| 1,000,000,000 ns Tape Archives |
TODAY'S AGENDA
Processing ModelsAccess MethodsModification QueriesExpression Evaluation
PROCESSING MODEL
A DBMS's processing model defines how the system executes a query plan and moves data from one operator to the next.
$\rightarrow$ Different trade- offs for workloads (OLTP vs. OLAP).
Each processing model is comprised of two types of execution paths: $\rightarrow$ Control Flow: How the DBMS invokes an operator. $\rightarrow$ Data Flow: How an operator sends its results.
The output of an operator can be either whole tuples (NSM) or subsets of columns (DSM).
PROCESSING MODEL
Approach #1: Iterator Model
Approach #2: Materialization Model
Approach #3: Vectorized / Batch Model
PROCESSING MODEL
Approach #1: Iterator Model Most Common
Approach #2: Materialization Model Rare
Approach #3: Vectorized / Batch Model Common
ITERATOR MODEL
Each query plan operator implements a Next() function.
$\rightarrow$ On each invocation, the operator returns either a single tuple or a EOF marker if there are no more tuples. $\rightarrow$ The operator implements a loop that calls Next() on its children to retrieve their tuples and then process them.
Each operator implementation also has Open() and Close() functions.
$\rightarrow$ Analogous to constructors/destructors, but for operators.
Also called Volcano or Pipeline Model.
ITERATOR MODEL
Control Flow → Data Flow →
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ITERATOR MODEL
ITERATOR MODEL
ITERATOR MODEL
Control Flow Data Flow
for t in child.Next() emit(projection(t))
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
for t in left.Next() buildHashTable(t) for t in right.Next() if probe(t)emit(t1t2)
for t in child.Next() if evalPred(t): emit(t)
for t in R: emit(t)
for t in S: emit(t)
ITERATOR MODEL
Control Flow Data Flow
1
for t in child.Next() : emit(projection(t))
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
for t in left.Next() : buildHashTable(t) for t in right.Next() : if probe(t) : emit(t1t2)
for t in child.Next() : if evalPred(t): emit(t)
for t in R: emit(t)
for t in S: emit(t)
ITERATOR MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ITERATOR MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ITERATOR MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ITERATOR MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ITERATOR MODEL
Control Flow Data Flow
1
for t in child.Next() emit(projection(t))
2
for t in left.Next() : buildHashTable(t) for t in right.Next): if probe(t2): emit(t1t2)
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
for t in R: emit(t)
for t in child.Next() : if evalPred(t): emit(t)
for t in S: emit(t)
ITERATOR MODEL
ITERATOR MODEL
ITERATOR MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ITERATOR MODEL
ITERATOR MODEL
ITERATOR MODEL
ITERATOR MODEL
The Iterator model is used in almost every DBMS. $\rightarrow$ Easy to implement / debug. $\rightarrow$ Output control works easily with this approach.
Allows for pipelining where the DBMS tries to process each tuple through as many operators as possible before retrieving the next tuple.
MATERIALIZATION MODEL
Each operator processes its input all at once and then emits its output all at once.
$\rightarrow$ The operator "materializes" its output as a single result. $\rightarrow$ The DBMS can push down hints (e.g., LIMIT) to avoid scanning too many tuples. $\rightarrow$ Can send either a materialized row or a single column.
The output can be either whole tuples (NSM) or subsets of columns (DSM). $\rightarrow$ Originally developed by MonetDB in the 1990s to process entire columns at a time instead of single tuples.
MATERIALIZATION MODEL
Control Flow Data Flow
out = [] for t in child.Output(): out.add(projection(t)) return out
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
out = [] for t in left.Output(): buildHashTable(t_1) for t_2 in right.Output(): if probe(t_2): out.add(t_1 * t_2) return out
out = [] for t in child.Output(): if evalPred(t): out.add(t) return out
out = [] for t in R: out.add(t) return out
out = [] for t in S: out.add(t) return out
MATERIALIZATION MODEL
MATERIALIZATION MODEL
MATERIALIZATION MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
MATERIALIZATION MODEL
MATERIALIZATION MODEL
MATERIALIZATION MODEL
MATERIALIZATION MODEL
MATERIALIZATION MODEL
MATERIALIZATION MODEL
Better for OLTP workloads because queries only access a small number of tuples at a time. $\rightarrow$ Lower execution / coordination overhead. $\rightarrow$ Fewer function calls.
Not ideal for OLAP queries with large intermediate results because DBMS must allocate buffers.
VECTORIZATION MODEL
Like the Iterator Model where each operator implements a Next() function, but...
Each operator emits a batch of tuples instead of a single tuple.
$\rightarrow$ The operator's internal loop processes multiple tuples at a time. $\rightarrow$ The size of the batch can vary based on hardware or query properties. $\rightarrow$ Each batch will contain one or more columns each their own null bitmaps.
VECTORIZATION MODEL
Control Flow Data Flow
out = [] for t in child.Next(): out.add(projection(t)) if |out| > n: emit(out)
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
VECTORIZATION MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
VECTORIZATION MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
VECTORIZATION MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
VECTORIZATION MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
VECTORIZATION MODEL
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
VECTORIZATION MODEL
Ideal for OLAP queries because it greatly reduces the number of invocations per operator.
Allows an out- of- order CPU to efficiently execute operators over batches of tuples. $\rightarrow$ Operators perform work in tight for- loops over arrays, which compilers know how to optimize / vectorize. $\rightarrow$ No data or control dependencies. $\rightarrow$ Hot instruction cache.
VECTORIZATION MODEL
Ideal for OLAP queries because it greatly reduces the number of invocations per operator.
Allows an out- of- order CPU to efficiently execute operators over batches of tuples. $\rightarrow$ Operators perform work in tight for- loops over arrays, which compilers know how to optimize / vectorize. $\rightarrow$ No data or control dependencies. $\rightarrow$ Hot instruction cache.
OBSERVATION
In the previous examples, the DBMS starts executing a query by invoking Next() at the root of the query plan and pulling data up from leaf operators.
This is the how most DBMSs implement their execution engine.
PLAN PROCESSING DIRECTION
Approach #1: Top-to-Bottom (Pull)
$\rightarrow$ Start with the root and "pull" data up from its children. $\rightarrow$ Tuples are always passed between operators using function calls (unless it's a pipeline breaker).
Approach #2: Bottom-to-Top (Push)
$\rightarrow$ Start with leaf nodes and "push" data to their parents. $\rightarrow$ Can "fuse" operators together within a for- loop to minimize intermediate result staging.
PLAN PROCESSING DIRECTION
Approach #1: Top-to-Bottom (Pull)
$\rightarrow$ Start with the root and "pull" data up from its children. $\rightarrow$ Tuples are always passed between operators using function calls (unless it's a pipeline breaker).
Approach #2: Bottom-to-Top (Push)
$\rightarrow$ Start with leaf nodes and "push" data to their parents. $\rightarrow$ Can "fuse" operators together within a for- loop to minimize intermediate result staging.
PUSH-BASED ITERATOR MODEL
Control Flow → Data Flow →
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
PUSH-BASED ITERATOR MODEL
Control Flow → Data Flow →
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
Pipeline #1
PUSH-BASED ITERATOR MODEL
Control Flow → Data Flow →
1 for $t_1$ in R: buildHashTable(t_1)
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
PUSH-BASED ITERATOR MODEL
Control Flow Data Flow
PUSH-BASED ITERATOR MODEL
Control Flow → Data Flow →
Scheduler
1
for $t_1$ in R: buildHashTable(t_1)
2
for $t_2$ in S: if evalPred(t): if probeHashTable(t_2): emit(projection(t_1, t_2))
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
Pipeline #1
PUSH-BASED ITERATOR MODEL
Control Flow Data Flow
Scheduler
for $t_1$ in R: buildHashTable(t_1)
for $t_2$ in S: if evalPred(t): if probeHashTable(t_2): emit(projection(t_1, t_2))
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
Pipeline #1
PUSH-BASED ITERATOR MODEL
Control Flow Data Flow
Scheduler
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
Pipeline #1
PUSH-BASED ITERATOR MODEL
Control Flow Data Flow
PLAN PROCESSING DIRECTION
Approach #1: Top-to-Bottom (Pull)
$\rightarrow$ Easy to control output via LIMIT.
$\rightarrow$ Parent operator blocks until its child returns with a tuple. $\rightarrow$ Additional overhead because operators' Next() functions are implemented as virtual functions. $\rightarrow$ Branching costs on each Next() invocation.
Approach #2: Bottom-to-Top (Push)
$\rightarrow$ Allows for tighter control of caches/registers in pipelines. $\rightarrow$ May not have exact control of intermediate result sizes. $\rightarrow$ Difficult to implement some operators (Sort- Merge Join).
PLAN PROCESSING DIRECTION
Approach #1: Top- to- Bottom (Pull) Most Common
$\rightarrow$ Easy to control output via LIMIT.
$\rightarrow$ Parent operator blocks until its child returns with a tuple. $\rightarrow$ Additional overhead because operators' Next() functions are implemented as virtual functions. $\rightarrow$ Branching costs on each Next() invocation.
Approach #2: Bottom- to- Top (Push) Rare $\rightarrow$ Allows for tighter control of caches/registers in pipelines. $\rightarrow$ May not have exact control of intermediate result sizes. $\rightarrow$ Difficult to implement some operators (Sort- Merge Join).
ACCESS METHODS
An access method is the way that the DBMS accesses the data stored in a table. $\longrightarrow$ Not defined in relational algebra.
Three basic approaches: $\longrightarrow$ Sequential Scan. $\longrightarrow$ Index Scan (many variants). $\longrightarrow$ Multi- Index Scan.
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
ACCESS METHODS
An access method is the way that the DBMS accesses the data stored in a table. $\longrightarrow$ Not defined in relational algebra.
Three basic approaches: $\longrightarrow$ Sequential Scan. $\longrightarrow$ Index Scan (many variants). $\longrightarrow$ Multi- Index Scan.
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100
SEQUENTIAL SCAN
For each page in the table:
$\rightarrow$ Retrieve it from the buffer pool. $\rightarrow$ Iterate over each tuple and check whether to include it.
The DBMS maintains an internal cursor that tracks the last page / slot it examined.
for page in table.pages: for t in page.tuples: if evalPred(t): // Do Something!
SEQUENTIAL SCAN: OPTIMIZATIONS
Data Encoding / CompressionPrefetching / Scan Sharing / Buffer BypassTask Parallelization / Multi- threadingClustering / SortingLate MaterializationMaterialized Views / Result CachingData SkippingData Parallelization / VectorizationCode Specialization / Compilation
SEQUENTIAL SCAN: OPTIMIZATIONS
Lecture #5 Data Encoding / Compression Lecture #06 Prefetching / Scan Sharing / Buffer Bypass Lecture #14 Task Parallelization / Multi- threading Lecture #08 Clustering / Sorting Lecture #12 Late Materialization Materialized Views / Result Caching Data Skipping Lecture #14 Data Parallelization / Vectorization - Code Specialization / Compilation
SEQUENTIAL SCAN: OPTIMIZATIONS
Lecture #5 Data Encoding / Compression Lecture #06 Prefetching / Scan Sharing / Buffer Bypass Lecture #14 Task Parallelization / Multi- threading Lecture #08 Clustering / Sorting Lecture #12 Late Materialization Materialized Views / Result Caching Data Skipping Lecture #14 Data Parallelization / Vectorization - Code Specialization / Compilation
DATA SKIPPING
Approach #1: Approximate Queries (Lossy)
$\rightarrow$ Execute queries on a sampled subset of the entire table to produce approximate results. $\rightarrow$ Examples: BlinkDB, Redshift, ComputeDB, XDB, Oracle, Snowflake, Google BigQuery, DataBricks
Approach #2: Zone Maps (Lossless)
$\rightarrow$ Pre- compute columnar aggregations per page that allow the DBMS to check whether queries need to access it. $\rightarrow$ Trade- off between page size vs. filter efficacy. $\rightarrow$ Examples: Oracle, Vertica, SingleStore, Netezza, Snowflake, Google BigQuery
ZONE MAPS
Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
Original Data
Table (html):
ZONE MAPS
Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
ZONE MAPS
Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
SELECT * FROM table WHERE val > 600
Original Data
ZONE MAPS
Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
SELECT * FROM table WHERE val > 600
Original Data
ZONE MAPS
Pre- computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page.
SELECT * FROM table WHERE val > 600
Original Data
Parquet
ZONE MA
Pre- computed aggregates for a page. DBMS checks the zone whether it wants to access the
SELECT * FROM table WHERE val > 600
Original Data
Table (html):
Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing
Guido Moerkotte moer@pi3. informatik.uni- mannheim.de Lehrstuhl fur praktische Informatik III, Universitat Mannheim, Germany
Abstract
Small Materialized Aggregates (SMAs for short) are considered a highly flexible and versatile alternative for materialized data cubes. The basic idea is to compute many aggregate values for small to medium- sized buckets of tuples. These aggregates are then used to speed up query processing. We present the general idea and present an application of SMAs to the TPC- D benchmark. We show that ex- ploiting SMAs for TPC- D Query 1 results in a speed up of two orders of magnitude. Then, we investigate the problem of query processing in the presence of SMAs. Last, we briefly discuss some further tuning possibilities for SMAs.
1 Introduction
Among the predominant demands put on data ware- house management systems (DWMSs) is performance, i.e., the highly efficient evaluation of complex analytical queries. A very successful means to speed up query processing is the exploitation of index structures. Several index structures have been applied to data warehouse management systems (for an overview see [2, 17]). Among them are traditional index structures [1, 3, 6], bitmaps [15], and R- tree- like structures [9].
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed in direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998
Since most of the queries against data warehouses incorporate grouping and aggregation, it seems to be a good idea to materialize according views. The most popular of these approaches is the materialized data cube where for a set of dimensions, for all their possible grouping combinations, the aggregates of interest are materialized. Then, query processing against a data cube boils down to a very efficient lookup. Since the complete data cube is very space consuming [5, 18], strategies have been developed for materializing only those parts of a data cube that pay off most in query processing [10]. Another approach- based on [14] is to hierarchically organize the aggregates [12]. But still the storage consumption can be very high, even for a simple grouping possibility, if the number of dimensions and/or their cardinality grows. On the user side, the data cube operator has been proposed to allow for easier query execution [8]. But since we deal with performance limits, we will throughout the rest of the paper use the term data cube to refer to a materialized data cube used to speed up query processing.
Besides high storage consumption, the biggest disadvantage of the data cube is its inflexibility. Each data cube implies a fixed number of queries that can be answered with it. As soon as for example an additional selection condition occurs in the query, the data cube might not be applicable any more. Further- . more, for queries not foreseen by the data cube de- . signer, the data cube is useless. This argument applies also to alternative structures like the one presented in [12]. This inflexibility- - together with the extordi- . nary space consumption- - maybe the reason why, to the knowledge of the author, data cubes have never been applied to the standard data warehouse benchmark TPC- D [19]. (cf. Section 2.4 for space requirements of a data cube applied to TPC- D data) Our goal was to design an index structure that allows for effi- . cient support of complex queries against high volumes of data as exemplified by the TPC- D benchmark.
The main problem encountered is that some queries
INDEX SCAN
The DBMS picks an index to find the tuples that the query needs.
Which index to use depends on:
$\rightarrow$ What attributes the index contains $\rightarrow$ What attributes the query references $\rightarrow$ The attribute's value domains $\rightarrow$ Predicate composition $\rightarrow$ Whether the index has unique or non- unique keys
INDEX SCAN
The DBMS picks an index to find the tuples that the query needs.
Lecture #15
Which index to use depends on: $\rightarrow$ What attributes the index contains $\rightarrow$ What attributes the query references $\rightarrow$ The attribute's value domains $\rightarrow$ Predicate composition $\rightarrow$ Whether the index has unique or non- unique keys
INDEX SCAN
Suppose that we have a single table with 100 tuples and two indexes:
$\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
INDEX SCAN
Suppose that we have a single table with 100 tuples and two indexes:
$\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
Scenario #1
There are 99 people under the age of 30 but only 2 people in the CS department.
INDEX SCAN
Suppose that we have a single table with 100 tuples and two indexes:
$\longrightarrow$ Index #1: age $\longrightarrow$ Index #2: dept
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
Scenario #1
There are 99 people under the age of 30 but only 2 people in the CS department.
Scenario #2
There are 99 people in the CS department but only 2 people under the age of 30.
MULTI-INDEX SCAN
If there are multiple indexes available for a query, the DBMS does not have to pick only one:
$\longrightarrow$ Compute sets of Record IDs using each matching index. $\longrightarrow$ Combine these sets based on the query's predicates (union vs. intersect). $\longrightarrow$ Retrieve the records and apply any remaining predicates.
Examples:
$\longrightarrow$ DB2 Multi- Index Scan $\longrightarrow$ PostgreSQL Bitmap Scan $\longrightarrow$ MySQL Index Merge
MULTI-INDEX SCAN
Given the following query on a database with an index #1 on age and an index #2 on dept:
$\longrightarrow$ We can retrieve the Record IDs satisfying age<30 using index #1. $\longrightarrow$ Then retrieve the Record IDs satisfying dept='cs' using index #2. $\longrightarrow$ Take their intersection. $\longrightarrow$ Retrieve records and check country='us'.
SELECT * FROM students WHERE age < 30 AND dept = 'cs' AND country = 'us'
MULTI-INDEX SCAN
Set intersection can be done efficiently with bitmaps or hash tables.
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
MULTI-INDEX SCAN
Set intersection can be done efficiently with bitmaps or hash tables.
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
MULTI-INDEX SCAN
Set intersection can be done efficiently with bitmaps or hash tables.
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
MULTI-INDEX SCAN
Set intersection can be done efficiently with bitmaps or hash tables.
SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
MODIFICATION QUERIES
Operators that modify the database (INSERT, UPDATE, DELETE) are responsible for modifying the target table and its indexes.
$\rightarrow$ Constraint checks can either happen immediately inside of operator or deferred until later in query/transaction.
The output of these operators can either be Record Ids or tuple data (i.e., RETURNING).
MODIFICATION QUERIES
UPDATE/DELETE:
$\rightarrow$ Child operators pass Record IDs for target tuples. $\rightarrow$ Must keep track of previously seen tuples.
INSERT:
$\rightarrow$ Choice #1: Materialize tuples inside of the operator. $\rightarrow$ Choice #2: Operator inserts any tuple passed in from child operators.
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
for t in child.Next(): removeFromIndex(idx_salary, t.salary, t) updateTuple(t.salary = t.salary + 100) insertIntoIndex(idx_salary, t.salary, t)
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
for t in Indexpeople: if t.salary < 1100: emit(t)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
UPDATE QUERY PROBLEM
UPDATE QUERY PROBLEM
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow → Data Flow →
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
UPDATE QUERY PROBLEM
Control Flow Data Flow
CREATE INDEX idx_salary ON people (salary);
UPDATE people SET salary = salary + 100 WHERE salary < 1100
Index(people.salary)
HALLOWEEN PROBLEM
Anomaly where an update operation changes the physical location of a tuple, which causes a scan operator to visit the tuple multiple times. $\rightarrow$ Can occur on clustered tables or index scans.
First discovered by IBM researchers while working on System R on Halloween day in 1976.
Solution: Track modified record ids per query.
EXPRESSION EVALUATION
The DBMS represents a WHERE clause as an expression tree.
The nodes in the tree represent different expression types:
$\longrightarrow$ Comparisons $(= , < , > , ! =)$ $\longrightarrow$ Conjunction (AND), Disjunction (OR) $\longrightarrow$ Arithmetic Operators (+, - , *, /, %) $\longrightarrow$ Constant Values $\longrightarrow$ Tuple Attribute References $\longrightarrow$ Functions
SELECT R.id, S.cdate FROM R JOIN S ON R.id = S.id WHERE S.value > 100;
EXPRESSION EVALUATION
The DBMS represents a WHERE clause as an expression tree.
The nodes in the tree represent different expression types:
$\longrightarrow$ Comparisons $(= , < , > , ! =)$ $\longrightarrow$ Conjunction (AND), Disjunction (OR) $\longrightarrow$ Arithmetic Operators $(+, - ,*, / ,%)$ $\longrightarrow$ Constant Values $\longrightarrow$ Tuple Attribute References $\longrightarrow$ Functions
EXPRESSION EVALUATION
PREPARE XXX AS SELECT * FROM S WHERE S.val = $1 + 9 EXECUTE XXX(991)
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXECUTE XXX(991)
Execution Context
EXPRESSION EVALUATION
EXECUTE XXX(991)
Execution Context
EXPRESSION EVALUATION
EXECUTE XXX(991)
Execution Context
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXPRESSION EVALUATION
EXECUTE XXX(991)
Execution Context
EXPRESSION EVALUATION
Evaluating predicates by traversing a tree is terrible for the CPU.
$\rightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
SELECT * WHERE s.val = 1;
EXPRESSION EVALUATION
Evaluating predicates by traversing a tree is terrible for the CPU.
$\longrightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
A better approach is to evaluate the expression directly.
An even better approach is to vectorize it evaluate a batch of tuples at the same time...
SELECT * WHERE s.val = 1;
EXPRESSION EVALUATION
Evaluating predicates by traversing a tree is terrible for the CPU.
$\rightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
A better approach is to evaluate the expression directly.
An even better approach is to vectorize it evaluate a batch of tuples at the same time...
SELECT * WHERE s.val = 1;
EXPRESSION EVALUATION
Evaluating predicates by traversing a tree is terrible for the CPU.
$\longrightarrow$ The DBMS traverses the tree and for each node that it visits, it must figure out what the operator needs to do.
A better approach is to evaluate the expression directly.
An even better approach is to vectorize it evaluate a batch of tuples at the same time...
SELECT * WHERE s.val = 1;
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
WHERE UPPER(col1) = UPPER('wutang');
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
$\longrightarrow$ Identify redundant / unnecessary operations that are wasteful. $\longrightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
WHERE UPPER(col1) = UPPER('wutang');
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
WHERE UPPER(col1) = UPPER('wutang');
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
WHERE UPPER(col1) = UPPER('wutang');
Common Sub-Expr. Elimination:
Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
Common Sub-Expr. Elimination:
$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
WHERE UPPER(col1) = UPPER('wutang');
WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
Common Sub-Expr. Elimination:
$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
WHERE UPPER(col1) = UPPER('wutang');
WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
Common Sub-Expr. Elimination:
$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
WHERE UPPER(col1) = UPPER('wutang');
WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8
EXPRESSION EVALUATION: OPTIMIZATIONS
Constant Folding:
Constant Folding: $\rightarrow$ Identify redundant / unnecessary operations that are wasteful. $\rightarrow$ Compute a sub- expression on a constant value once and reuse result per tuple.
Common Sub-Expr. Elimination:
$\rightarrow$ Identify repeated sub- expressions that can be shared across expression tree. $\rightarrow$ Compute once and then reuse result.
WHERE UPPER(col1) = UPPER('wutang');
WHERE STRPOS('x', col1) < 2 OR STRPOS('x', col1) > 8
CONCLUSION
The same query plan can be executed in multiple different ways.
(Most) DBMSs will want to use index scans as much as possible.
Expression trees are flexible but slow. JIT compilation can (sometimes) speed them up.
NEXT CLASS
Parallel Query Execution