Carnegie Mellon University Database Systems Hash Tables
ADMINISTRIVIA
Project #1 is due on February 9th @ 11
Homework #2 is due February 9th @ 11
COURSE OUTLINE
We are now going to talk about how to support the DBMS's execution engine to read/write data from pages.
Two types of data structures: $\rightarrow$ Hash Tables (Unordered) $\rightarrow$ Trees (Ordered)
Query Planning
Operator Execution
Access Methods
Buffer Pool Manager
Disk Manager
TODAY'S AGENDA
Background Hash Functions Static Hashing Schemes Dynamic Hashing Schemes DB Flash Talk: DataStax
DATA STRUCTURES
Internal Meta- dataCore Data StorageTemporary Data StructuresTable Indexes
DESIGN DECISIONS
Data Organization
$\rightarrow$ How we layout data structure in memory/pages and what information to store to support efficient access.
Concurrency
$\rightarrow$ How to enable multiple threads to access the data structure at the same time without causing problems.
HASH TABLES
A hash table implements an unordered associative array that maps keys to values.
It uses a hash function to compute an offset into this array for a given key, from which the desired value can be found.
Space Complexity: O(n)Time Complexity: $\rightarrow$ Average: O(1) $\leftarrow$ Databases care about constants! $\rightarrow$ Worst: O(n)
STATIC HASH TABLE
Allocate a giant array that has one slot for every element you need to store.
To find an entry, mod the key by the number of elements to find the offset in the array.
hash(key) % N
STATIC HASH TABLE
Allocate a giant array that has one slot for every element you need to store.
To find an entry, mod the key by the number of elements to find the offset in the array.
hash(key) % N
STATIC HASH TABLE
Allocate a giant array that has one slot for every element you need to store.
To find an entry, mod the key by the number of elements to find the offset in the array.
hash(key) % N
UNREALISTIC ASSUMPTIONS
Assumption #1: Number of elements is known ahead of time and fixed.
Assumption #2: Each key is unique.
Assumption #3: Perfect hash function guarantees no collisions. $\rightarrow$ If key1≠key2, then hash(key1)≠hash(key2)
hash(key)% N
HASH TABLE
Design Decision #1: Hash Function
$\rightarrow$ How to map a large key space into a smaller domain. $\rightarrow$ Trade- off between being fast vs. collision rate.
Design Decision #2: Hashing Scheme
$\rightarrow$ How to handle key collisions after hashing. $\rightarrow$ Trade- off between allocating a large hash table vs. additional instructions to get/put keys.
HASH FUNCTIONS
For any input key, return an integer representation of that key.
$\rightarrow$ Converts arbitrary byte array into a fixed- length code.
We want something that is fast and has a low collision rate.
We do not want to use a cryptographic hash function for DBMS hash tables (e.g., SHA- 2).
HASH FUNCTIONS
CRC-64 (1975)
$\rightarrow$ Used in networking for error detection.
MurmurHash (2008)
$\rightarrow$ Designed as a fast, general- purpose hash function.
Google CityHash (2011)
$\rightarrow$ Designed to be faster for short keys (<64 bytes).
Facebook XXHash (2012)
$\rightarrow$ From the creator of zstd compression.
Google FarmHash (2014)
$\rightarrow$ Newer version of CityHash with better collision rates.
HASH FUNCTIONS
CRC-64 (1975)
$\rightarrow$ Used in networking for error detection.
MurmurHash (2008)
$\rightarrow$ Designed as a fast, general- purpose hash function.
Google CityHash (2011)
$\rightarrow$ Designed to be faster for short keys (<64 bytes).
Facebook XXHash (2012)
$\rightarrow$ From the creator of zstd compression.
State- of- the- art
Google FarmHash (2014)
$\rightarrow$ Newer version of CityHash with better collision rates.
HASH FUNCTIONS
smhasher
SMhasher
Table (html):
| Hash function | MiB/sec | cycl./hash | cycl./map | size | Quality problems |
| donnothing32 | 11149460.06 | 4.00 | - | 13 | bad seed 0, test NOP |
| donnothing64 | 11787676.42 | 4.00 | - | 13 | bad seed 0, test NOP |
| donnothing128 | 11745060.76 | 4.06 | - | 13 | bad seed 0, test NOP |
| NOP_OAAT_read64 | 11372846.37 | 14.00 | - | 47 | test NOP |
| BadHash | 769.94 | 73.97 | - | 47 | bad seed 0, test FAIL |
| sumhash | 10699.57 | 29.53 | - | 363 | bad seed 0, test FAIL |
| sumhash32 | 42877.79 | 23.12 | - | 863 | UB, test FAIL |
| multiply_shift | 8026.77 | 26.05 | 226.80 (8) | 345 | bad seeds & 0xfliff0, fails most tests |
| pair_multiply_shift | 3716.95 | 40.22 | 186.34 (3) | 609 | fails most tests |
| crc32 | 383.12 | 134.21 | 257.50 (11) | 422 | insecure, 8590x collisions, distrib, PerlinNoise |
| md5_32 | 350.53 | 644.31 | 894.12 (10) | 4419 | |
[TableCaption: Linux Build status building building]
n.
tate- of- the- art
rates.
HASH FUNCTIONS
smhasher
SMhasher
Linux Build status building building failing
Table (html):
| Hash function | MiB/sec | cycl./hash | cycl./map | size |
| donnothing32 | 11149460.06 | 4.00 | - | 13 |
| donnothing64 | 11787676.42 | 4.00 | - | 13 |
| donnothing128 | 11745060.76 | 4.06 | - | 13 |
| NOP_OAAT_read64 | 11372846.37 | 14.00 | - | 4 |
| BadHash | 769.94 | 73.97 | - | 4 |
| sumhash | 10699.57 | 29.53 | - | 36 |
| sumhash32 | 42877.79 | 23.12 | - | 8 |
| multiply_shift | 8026.77 | 26.05 | 226.80 (8) | 3 |
| pair_multiply_shift | 3716.95 | 40.22 | 186.34 (3) | 6 |
| crc32 | 383.12 | 134.21 | 257.50 (11) | 4 |
| md5_32 | 350.53 | 644.31 | 894.12 (10) | 4 |
Summary
rapidhash (an improved wyhash) - xhx3low wyhash - umash (even universal!) - ahash64 - t1ha2_atonce komihash FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...) - halftime_hash128 - Spooky32 pengyhash - nmhash32 - mx3 - MUM/mir (different results on 32/64- bit archs, lots of bad seeds to filter out) - fasthash32
STATIC HASHING SCHEMES
Approach #1: Linear Probe Hashing Approach #2: Cuckoo Hashing
There are several other schemes covered in the Advanced DB course:
$\longrightarrow$ Robin Hood Hashing $\longrightarrow$ Hopscotch Hashing $\longrightarrow$ Swiss Tables
STATIC HASHING SCHEMES
Approach #1: Linear Probe Hashing Approach #2: Cuckoo Hashing
Open Addressing
There are several other schemes covered in the Advanced DB course:
$\longrightarrow$ Robin Hood Hashing $\longrightarrow$ Hopscotch Hashing $\longrightarrow$ Swiss Tables
LINEAR PROBE HASHING
Single giant table of fixed- length slots.
Resolve collisions by linearly searching for the next free slot in the table.
$\rightarrow$ To determine whether an element is present, hash to a location in the table and scan for it. $\rightarrow$ Store keys in table to know when to stop scanning. $\rightarrow$ Insertions and deletions are generalizations of lookups.
The table's load factor determines when it is becoming too full and should be resized. $\rightarrow$ Allocate a new table twice as large and rehash entries.
LINEAR PROBE HASHING
hash(key)% N
A B C D E F
LINEAR PROBE HASHING
hash(key)% N
A B C D E F
LINEAR PROBE HASHING
hash(key)% N
A B C D E F
LINEAR PROBE HASHING
LINEAR PROBE HASHING
hash(key)% N
LINEAR PROBE HASHING
hash(key)% N
LINEAR PROBE HASHING
hash(key)% N
LINEAR PROBE HASHING
hash(key)% N
LINEAR PROBE HASHING
hash(key)% N
LINEAR PROBE HASHING
hash(key)% N
HASH TABLE – KEY/VALUE ENTRIES
Fixed-length Key/Values:
$\rightarrow$ Store inline within the hash table pages. $\rightarrow$ Optional: Store the key's hash with the key for faster comparisons.
Variable-length Key/Values:
$\rightarrow$ Insert key/value data in separate a private temporary table. $\rightarrow$ Store the hash as the key and use the record id pointing to its corresponding entry in the temporary table as the value.
HASH TABLE - KEY/VALUE ENTRIES
Fixed-length Key/Values:
$\rightarrow$ Store inline within the hash table pages. $\rightarrow$ Optional: Store the key's hash with the key for faster comparisons.
Variable-length Key/Values:
$\rightarrow$ Insert key/value data in separate a private temporary table. $\rightarrow$ Store the hash as the key and use the record id pointing to its corresponding entry in the temporary table as the value.
Table (html):
| hash | key | value |
| hash | key | value |
| hash | key | value |
| | |
HASH TABLE - KEY/VALUE ENTRIES
Fixed-length Key/Values:
$\rightarrow$ Store inline within the hash table pages. $\rightarrow$ Optional: Store the key's hash with the key for faster comparisons.
Variable-length Key/Values:
$\rightarrow$ Insert key/value data in separate a private temporary table. $\rightarrow$ Store the hash as the key and use the record id pointing to its corresponding entry in the temporary table as the value.
Table (html):
| hash | key | value |
| hash | key | value |
| hash | key | value |
| | |
Table (html):
| hash | RecordId |
| hash | RecordId |
| hash | RecordId |
| |
| Temp Table Page |
Table (html):
| key | value | |
| key | value |
| key | value | |
LINEAR PROBE HASHING - DELETES
LINEAR PROBE HASHING - DELETES
LINEAR PROBE HASHING - DELETES
LINEAR PROBE HASHING - DELETES
hash(key)% N
A B C D E F
LINEAR PROBE HASHING - DELETES
LINEAR PROBE HASHING - DELETES
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot.
LINEAR PROBE HASHING - DELETES
Approach #1: Movement $\rightarrow$ Rehash keys until you find the first empty slot. $\rightarrow$ Expensive! May need to reorganize the entire table. $\rightarrow$ No DBMS does this.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
hash(key)% N
A B C D E F
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
hash(key)% N
A B C D E F
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
LINEAR PROBE HASHING - DELETES
Approach #2: Tombstone $\rightarrow$ Maintain separate bit map to indicate that the entry in the slot is logically deleted. $\rightarrow$ Reuse the slot for new keys. $\rightarrow$ May need periodic garbage collection.
HASH TABLE – NON-UNIQUE KEYS
Choice #1: Separate Linked List
$\longrightarrow$ Store values in separate storage area for each key. $\longrightarrow$ Value lists can overflow to multiple pages if the number of duplicates is large.
HASH TABLE – NON-UNIQUE KEYS
Choice #1: Separate Linked List
$\rightarrow$ Store values in separate storage area for each key. $\rightarrow$ Value lists can overflow to multiple pages if the number of duplicates is large.
Choice #2: Redundant Keys
$\rightarrow$ Store duplicate keys entries together in the hash table. $\rightarrow$ This is what most systems do.
OPTIMIZATIONS
Specialized hash table implementations based on key type(s) and sizes. $\rightarrow$ Example: Maintain multiple hash tables for different string sizes for a set of keys.
OPTIMIZATIONS
Specialized hash table implementations based on key type(s) and sizes. $\longrightarrow$ Example: Maintain multiple hash tables for different string sizes for a set of keys.
Store metadata separate in a separate array. $\longrightarrow$ Packed bitmap tracks whether a slot is empty/tombstone.
OPTIMIZATIONS
Specialized hash table implementations based on key type(s) and sizes. $\longrightarrow$ Example: Maintain multiple hash tables for different string sizes for a set of keys.
Store metadata separate in a separate array. $\longrightarrow$ Packed bitmap tracks whether a slot is empty/tombstone.
Use table + slot versioning metadata to quickly invalidate all entries in the hash table. $\longrightarrow$ Example: If table version does not match slot version, then treat the slot as empty.
OPTIMIZAT
Specialized hash table implementype(s) and sizes. $\longrightarrow$ Example: Maintain multiple hash sizes for a set of keys.
Store metadata separate in a s $\longrightarrow$ Packed bitmap tracks whether a
Use table + slot versioning m invalidate all entries in the ha $\longrightarrow$ Example: If table version does n treat the slot as empty.
Hash tables in ClickHouse and C++ Zero-cost Abstractions
Introduction
Hash tables are the diva of data structures. No other data structure offers the same opportunity for bugs to be introduced during optimization. In this blog post, we explore how hash tables are used in ClickHouse. We'll show how low cost abstractions work in modern C+ and how to get a variety of data structures from a common codebase with a few little tricks.
CUCKOO HASHING
Use multiple hash functions to find multiple locations in the hash table to insert records. $\rightarrow$ On insert, check multiple locations and pick the one that is empty. $\rightarrow$ If no location is available, evict the element from one of them and then re- hash it find a new location.
Look- ups and deletions are always O(1) because only one location per hash table is checked.
Best open- source implementation is from CMU.
CUCKOO HASHING
CUCKOO HASHING
Put A: hash $_1$ (A) hash $_2$ (A)
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
Put A: hash(A) hash(A) Put B: hash(B) hash(B)
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
CUCKOO HASHING
OBSERVATION
The previous hash tables require the DBMS to know the number of elements it wants to store. $\rightarrow$ Otherwise, it must rebuild the table if it needs to grow/shrink in size.
Dynamic hash tables incrementally resize themselves as needed. $\rightarrow$ Chained Hashing $\rightarrow$ Extendible Hashing $\rightarrow$ Linear Hashing
CHAINED HASHING
Maintain a linked list of buckets for each slot in the hash table.
Resolve collisions by placing all elements with the same hash key into the same bucket. $\longrightarrow$ To determine whether an element is present, hash to its bucket and scan for it. $\longrightarrow$ Insertions and deletions are generalizations of lookups.
CHAINED HASHING
hash(key)% N
CHAINED HASHING
hash(key)% N
Bucket Pointers
Buckets
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
hash(key) % N Put A Put B Put C Put D Put E
Bucket Pointers
B|value
A|value C|value
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
CHAINED HASHING
EXTENDIBLE HASHING
Chained- hashing approach that splits buckets incrementally instead of letting the linked list grow forever.
Multiple slot locations can point to the same bucket chain.
Reshuffle bucket entries on split and increase the number of bits to examine. $\rightarrow$ Data movement is localized to just the split chain.
EXTENDIBLE HASHING
EXTENDIBLE HASHING
EXTENDIBLE HASHING
Max number of bits to examine in hashes
EXTENDIBLE HASHING
Max number of bits to examine in hashes
EXTENDIBLE HASHING
EXTENDIBLE HASHING
EXTENDIBLE HASHING
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. .. Put B hash(B) = 10111. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. .. Put B hash(B) = 10111. ..
EXTENDIBLE HASHING
EXTENDIBLE HASHING
EXTENDIBLE HASHING
EXTENDIBLE HASHING
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
Put B hash(B) = 10111. ..
Put C hash(C) = 10100. ..
EXTENDIBLE HASHING
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
Put B hash(B) = 10111. ..
Put C hash(C) = 10100. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
Put B hash(B) = 10111. ..
Put C hash(C) = 10100. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
Put B hash(B) = 10111. ..
Put C hash(C) = 10100. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
Put B hash(B) = 10111. ..
Put C hash(C) = 10100. ..
EXTENDIBLE HASHING
Get A hash(A) = 01110. ..
Put B hash(B) = 10111. ..
Put C hash(C) = 10100. ..
LINEAR HASHING
The hash table maintains a pointer that tracks the next bucket to split. $\rightarrow$ When any bucket overflows, split the bucket at the pointer location. Use multiple hashes to find the right bucket for a given key.
Can use different overflow criterion: $\rightarrow$ Space Utilization $\rightarrow$ Average Length of Overflow Chains
LINEAR HASHING
LINEAR HASHING
LINEAR HASHING
LINEAR HASHING
Get 6 $hash_{1}(6) = 6% 4 = 2$
LINEAR HASHING
Get 6 $hash_{1}(6) = 6% 4 = 2$
Put 17 $hash_{1}(17) = 17% 4 = 1$
LINEAR HASHING
Get 6 hash(6)= 6 % 4 = 2
Put 17 hash(17)= 17 % 4 = 1
LINEAR HASHING
Get 6 hash(6)= 6 % 4 = 2
Put 17 hash(17)= 17 % 4 = 1
LINEAR HASHING
Get 6 hash_6= 6%4=2
Put 17 hash_17= 17%4=1
LINEAR HASHING
Get 6 hash_6= 6%4=2
Put 17 hash_17= 17%4=1
LINEAR HASHING
Get 6 hash_6= 6%4=2
Put 17 hash_17= 17%4=1 hash_28=8%8=0
LINEAR HASHING
Get 6 hash_6= 6%4=2
Put 17 hash_17= 17%4=1 hash_8=8%8=0 hash_20=20%8=4
LINEAR HASHING
Get 6 hash_6= 6%4=2
Put 17 hash_17= 17%4=1 hash_8=8%8=0 hash_20=20%8=4
LINEAR HASHING
Get 6 hash_6= 6%4=2
Put 17 hash_17= 17%4=1 hash_8=8%8=0 hash_20=20%8=4
LINEAR HASHING
Get 6 hash(6)= 6 % 4 = 2
Put 17 hash(17)= 17 % 4 = 1 hash(8)= 8 % 8 = 0 hash(20)= 20 % 8 = 4
Get 20 hash(20)= 20 % 4 = 0
LINEAR HASHING
Get 6 hash(6)= 6 % 4 = 2
Put 17 hash(17)= 17 % 4 = 1 hash(8)= 8 % 8 = 0 hash(20)= 20 % 8 = 4
Get 20 hash(20)= 20 % 4 = 0
LINEAR HASHING
Get 6 hash(6)= 6 % 4 = 2
Put 17 hash(17)= 17 % 4 = 1 hash(8)= 8 % 8 = 0 hash(20)= 20 % 8 = 4
Get 20 hash(20)= 20 % 4 = 0 hash(20)= 20 % 8 = 4
LINEAR HASHING
Get 6 hash(6)= 6 % 4 = 2
Put 17 hash(17)= 17 % 4 = 1 hash(8)= 8 % 8 = 0 hash(20)= 20 % 8 = 4
Get 20 hash(20)= 20 % 4 = 0 hash(20)= 20 % 8 = 4
Get 9 hash(9)= 9 % 4 = 1
LINEAR HASHING
Get 6 hash1(6)= 6 % 4 = 2
Put 17 hash1(17)= 17 % 4 = 1 hash2(8)= 8 % 8 = 0 hash2(20)= 20 % 8 = 4
Get 20 hash1(20)= 20 % 4 = 0 hash2(20)= 20 % 8 = 4
Get 9 hash1(9)= 9 % 4 = 1
LINEAR HASHING - RESIZING
Splitting buckets based on the split pointer will eventually get to all overflowed buckets.
$\rightarrow$ When the pointer reaches the last slot, remove the first hash function and move pointer back to beginning.
If the "highest" bucket below the split pointer is empty, the hash table could remove it and move the splinter pointer in reverse direction.
LINEAR HASHING - DELETES
LINEAR HASHING - DELETES
Delete 20 hash(20) = 20 % 4 = 0
LINEAR HASHING - DELETES
Delete 20 hash,(20)= 20 % 4 = 0
LINEAR HASHING - DELETES
Delete 20 $\begin{array}{r}hash_{1}(20) = 20% 4 = 0\ hash_{2}(20) = 20% 8 = 4 \end{array}$
LINEAR HASHING - DELETES
Delete 20 $\begin{array}{r}hash_{1}(20) = 20% 4 = 0\ hash_{2}(20) = 20% 8 = 4 \end{array}$
LINEAR HASHING - DELETES
Delete 20 hash,(20)= 20 % 4 = 0 hash,(20)= 20 % 8 = 4
LINEAR HASHING - DELETES
Delete 20 $\begin{array}{r}hash_{1}(20) = 20% 4 = 0\ hash_{2}(20) = 20% 8 = 4 \end{array}$
LINEAR HASHING - DELETES
Delete 20 $\begin{array}{r}hash_{1}(20) = 20% 4 = 0\ hash_{2}(20) = 20% 8 = 4 \end{array}$
LINEAR HASHING - DELETES
Delete 20 $\begin{array}{r}hash_{1}(20) = 20% 4 = 0\ hash_{2}(20) = 20% 8 = 4 \end{array}$
LINEAR HASHING - DELETES
Delete 20 hash,(20)= 20 % 4 = 0 hash,(20)= 20 % 8 = 4
Put 21 hash,(21)= 21 % 4 = 1
CONCLUSION
Fast data structures that support O(1) look- ups that are used all throughout DBMS internals. $\longrightarrow$ Trade- off between speed and flexibility.
Hash tables are usually not what you want to use for a table index...
CONCLUSION
Fast data structures that support O(1) look- ups that are used all throughout DBMS internals. $\rightarrow$ Trade- off between speed and flexibility.
Hash tables are usually not what you want to use for a table index...
CREATE INDEX ON xxx (val);
CONCLUSION
Fast data structures that support O(1) look- ups that are used all throughout DBMS internals. $\rightarrow$ Trade- off between speed and flexibility.
Hash tables are usually not what you want to use for a table index...
CONCLUSION
Fast data structures that support O(1) look- ups that are used all throughout DBMS internals. $\longrightarrow$ Trade- off between speed and flexibility.
Hash tables are usually not what you want to use for a table index...
Order-Preserving Indexes ft. B+Trees $\rightarrow$ aka "The Greatest Data Structure of All Time"