Hubbry Logo
Full table scanFull table scanMain
Open search
Full table scan
Community hub
Full table scan
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Full table scan
Full table scan
from Wikipedia

A full table scan (also known as a sequential scan) is a scan made on a database where each row of the table is read in a sequential (serial) order and the columns encountered are checked for the validity of a condition.[1] Full table scans [2] are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers.

Overview

[edit]

In a database, a query that is not indexed results in a full table scan, where the database processes each record of the table to find all records meeting the given requirements. Even if the query selects just a few rows from the table, all rows in the entire table will be examined. This usually results in suboptimal performance but may be acceptable with very small tables or when the overhead of keeping indexes up to date is high.

When the optimizer considers a full table scan

[edit]

The most important factor in choosing depends on speed. This means that a full table scan should be used when it is the fastest and cannot use a different access path. Several full table scan examples are as follows.[3]

  • No index The optimizer must use a full table scan since no index exists.
  • Small number of rows The cost of full table scan is less than index range scan due to small table.
  • When query processed SELECT COUNT(*), nulls existed in the column The query is counting the number of null columns in a typical index. However, SELECT COUNT(*) can't count the number of null columns.
  • The query is unselective The number of return rows is too large and takes nearly 100% in the whole table. These rows are unselective.
  • The table statistics does not update The number of rows in the table is higher than before, but table statistics haven't been updated yet. The optimizer can't correctly estimate that using the index is faster.
  • The table has a high degree of parallelism The high degree of parallelism table distorts the optimizer from a true way, because optimizer would use full table scan.
  • A full table scan hint The hint lets optimizer to use full table scan.

Examples

[edit]

The first example shows a SQL statement that returns the name of every fruit in the fruits table whose color is red. If the fruits table does not have an index for the color column, then the database engine must load and examine every row within fruits in order to compare each row's color to 'red':

SELECT   name
FROM     fruits
WHERE    color = 'red';

The second example shows a SQL statement which returns the name of all fruits in the fruits table. Because this statement has no condition - no WHERE clause - the database engine will use a table scan to load and return the data for this query even if the fruits table has an index on the name column because accessing - i.e. scanning - the table directly is faster than accessing the table through the extra abstraction layer of an index:

SELECT   name
FROM     fruits

The third example is a counter-example that will almost certainly cause the SQL engine to use an index instead of a table scan. This example uses almost the same query as the previous one, but adds an ORDER BY clause so that the returned names will be in alphabetical order. Assuming that the fruits table has an index on the name column, the database engine will now use that index to return the names in order because accessing the table through the extra abstraction layer of the index provides the benefit of returning the rows in the requested order. Had the engine loaded the rows using a table scan, it would then have to perform the additional work of sorting the returned rows. In some extreme cases - e.g. the statistics maintained by the database engine indicate that the table contains a very small number of rows - the optimizer may still decide to use a table scan for this type of query:

SELECT   name
FROM     fruits
ORDER BY name

Pros and cons

[edit]

Pros:

  • The cost is predictable, as every time database system needs to scan full table row by row.
  • When table is less than 2 percent of database block buffer, the full scan table is quicker.

Cons:

  • Full table scan occurs when there is no index or index is not being used by SQL. And the result of full scan table is usually slower that index table scan. The situation is that: the larger the table, the slower of the data returns.
  • Unnecessary full-table scan will lead to a huge amount of unnecessary I/O with a process burden on the entire database.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A full table scan is a fundamental access method in relational database management systems (RDBMS) where the database engine sequentially reads every row in a table, starting from the first block up to the high water mark, to retrieve or evaluate data against query predicates. This operation processes the entire dataset without leveraging indexes, applying any filtering conditions after reading all rows to identify qualifying ones. In systems like Oracle, it utilizes multiblock reads for efficiency, scanning formatted blocks in large sequential I/O operations. Database optimizers select a full table scan when no suitable index is available, the query is unselective and requires scanning most or all rows, the table is small enough that index overhead outweighs benefits, or explicit hints force its use. For instance, in MySQL, it occurs if the table lacks indexes or if range access limits are exceeded, prompting fallback to scanning the whole table. Similarly, SQL Server performs table scans on heaps without clustered indexes or when all rows are needed, bypassing indexes for simplicity. In PostgreSQL, this manifests as a sequential scan, dividing blocks among parallel workers for large tables to improve throughput. While full table scans can be performant for small tables or bulk operations—reducing I/O overhead through fewer, larger reads—they become inefficient on large datasets with selective queries, leading to excessive resource consumption and slower execution times compared to index-based access paths. To mitigate this, database administrators often create indexes on frequently queried columns or use query hints to guide the optimizer toward alternatives like index scans or range scans. Monitoring tools in RDBMS such as MySQL's Performance Schema or Oracle's execution plans help identify and tune queries prone to full scans.

Fundamentals

Definition

A full table scan (FTS), also known as a sequential scan, is a database retrieval operation in which the query engine reads every row in a specified table or partition sequentially from beginning to end, without leveraging any indexes to skip portions of the . During this process, the engine applies any selection predicates—such as WHERE clause conditions—to each individual row to determine if it matches the query criteria, returning only those that qualify. This method is a basic access path in management systems (RDBMS), particularly when no suitable index exists or when the query requires accessing a significant portion of the table's data. The concept of full table scans originated in the early days of systems during the 1970s and 1980s, as part of the foundational query execution mechanisms in pioneering RDBMS like IBM's System R and commercial implementations such as (released in 1979) and later SQL Server (1989). These systems introduced structured query processing, where full table scans served as the default or fallback method for in the absence of optimization techniques like indexing. Key characteristics of a full table scan include the absence of row skipping, meaning the entire table storage—typically organized as a heap structure in systems like and SQL Server—is traversed in physical order without jumping to specific locations. Predicates are evaluated row-by-row, which can involve computing functions or joins for each entry, making it suitable primarily for heap-organized tables that lack a clustered index to define row ordering. This approach ensures complete coverage of the dataset but contrasts with index-based scans by not benefiting from selective access paths.

Mechanism

In a full table scan, the sequentially accesses all data blocks associated with the table to retrieve and evaluate rows against the query's conditions. The process begins with locating the table's or segment, typically stored in a heap-organized structure where rows are not ordered by any key. This involves identifying the (HWM) in systems like , which delineates the extent of allocated and potentially populated blocks, ensuring only relevant portions are scanned. The execution proceeds in distinct steps. First, the engine reads blocks or pages sequentially from disk or memory, leveraging multi-block reads to optimize I/O efficiency; for instance, Oracle uses the DB_FILE_MULTIBLOCK_READ_COUNT parameter to fetch multiple blocks at once during sequential access. These blocks are loaded into the buffer cache, a memory area that temporarily holds data pages to reduce physical disk reads. Once loaded, for each row within the block, the engine parses the row data, applies selection predicates from the query's WHERE clause to filter matching rows, and projects only the required columns as specified in the SELECT list. Matching rows are then returned to the query executor or upper layers of the plan. In PostgreSQL, this is handled through the SeqScan node, which initializes a scan descriptor and iteratively fetches the next tuple using access method routines like table_scan_getnextslot. Key data structures underpin this process. Tables are typically organized as heaps, contiguous allocations of blocks containing rows in insertion order without indexing. Rows support both fixed-length and variable-length formats: fixed-length rows allocate a constant size per column (e.g., CHAR fields padded to full width), while variable-length rows (e.g., or LOBs) use pointers or offsets within the block to accommodate differing sizes, enabling denser packing but requiring additional parsing overhead during scans. Buffer cache management plays a critical role, pinning blocks in memory during the scan to facilitate sequential processing and minimize cache thrashing, with blocks often aged out in a least-recently-used manner post-scan. Variations exist across database management systems (DBMS). In row-oriented relational DBMS like , the scan operates row-by-row within each 8KB page, checking visibility rules (e.g., MVCC snapshots) for each before applying predicates, which supports concurrent transaction isolation. Block-level reads are optimized to read entire pages sequentially, reducing I/O compared to , though the engine still processes individual rows for filtering. similarly employs block-level sequential reads but emphasizes multi-block I/O for large scans, processing rows within blocks via the buffer cache without inherent visibility checks, as its locking model differs.

Optimization Context

Query Optimizer Role

The query optimizer in a management system (RDBMS) is responsible for transforming a SQL query into an efficient execution plan by generating multiple alternative plans and selecting the one with the lowest estimated cost. This process typically employs a cost-based approach, utilizing heuristics and dynamic programming to enumerate join orders and access methods while minimizing resource consumption such as I/O operations and CPU cycles. In seminal work from the 1970s, the System R prototype at introduced this framework, evaluating plans bottom-up to ensure scalability for complex queries involving multiple relations. Within this optimization framework, a full table scan (FTS) serves as a fundamental baseline access method, particularly when indexed alternatives are unavailable or deemed too costly. The optimizer integrates FTS by considering it alongside other paths, such as index scans, during the enumeration of access specifications for single-relation queries and as the initial step in multi-relation joins. In System R's cost model, FTS is selected if its estimated cost—primarily the sequential read of all table pages—proves lower than alternatives, establishing it as a default option in the absence of viable indexes. To estimate the feasibility of an FTS, the optimizer relies on maintained statistics about table structures, including the total number of rows (cardinality), number of data pages, and data distribution via histograms or key value ranges. These statistics enable selectivity estimates for predicates, allowing the optimizer to predict the fraction of the table likely to be scanned and compute overall costs accurately. For instance, in cost-based systems like System R, page fetch costs for FTS are derived directly from the page count statistic, weighted against CPU overhead for tuple processing. Outdated or absent statistics can lead to suboptimal plan choices, underscoring the need for regular updates to reflect current data characteristics.

Selection Criteria

Query optimizers select a full table scan (FTS) as the access path when no suitable index is available for the query predicates, forcing the system to read the entire table sequentially. This choice also occurs when predicates involve functions applied to indexed columns without corresponding function-based indexes, or in cases like SELECT COUNT(*) where null values in indexed columns render indexes ineffective. Additionally, FTS is preferred for queries lacking a leading edge match on B-tree indexes, or when the query requires sequential access patterns such as aggregations or joins that benefit from reading all rows in order. FTS is triggered for low-selectivity predicates, meaning those that match a large portion of rows, as the overhead of index navigation outweighs sequential reading efficiency. For small tables, defined as a configurable block threshold like Oracle's DB_FILE_MULTIBLOCK_READ_COUNT, FTS becomes cost-effective due to minimal I/O demands. Cost estimation for FTS generally approximates the total as (number of blocks) × (I/O cost per block) + CPU cost for predicate evaluation, where I/O costs emphasize sequential multiblock reads and CPU accounts for row filtering. Thresholds vary across database management systems (DBMS); for instance, Oracle employs a rule of thumb favoring FTS for selectivity above approximately 5% in unselective queries accessing most table blocks. Several factors influence the optimizer's decision to opt for FTS. Stale or inaccurate table statistics can lead to underestimation of index selectivity, prompting an FTS even when indexes might be viable. Parallelism options, such as a high DEGREE value in ALL_TABLES, skew costs toward FTS by leveraging multiple processes for faster sequential throughput.

Performance Implications

Advantages

Full table scans offer significant efficiency gains in database operations, particularly for small tables where the overhead of index access is disproportionate to the data volume. Unlike index-based retrievals, full table scans eliminate the need for index maintenance during query execution, reducing CPU and I/O costs associated with traversing index structures. For small tables, this approach is often preferred, as the entire dataset can be read quickly without the added complexity and resource demands of maintaining indexes. The core advantage stems from sequential I/O, which leverages multi-block reads and prefetching mechanisms to access data in large, contiguous chunks from disk. This contrasts with the random, single-block I/O typical of index scans, making full table scans the fastest I/O pattern for retrieving substantial portions of a table. In scenarios where data is physically clustered or sorted by query criteria, further optimizes performance by minimizing seek times and cache inefficiencies. Full table scans are particularly well-suited for use cases involving aggregations, such as SUM or operations, where the entire must be examined regardless of selectivity. Full table scans are commonly used in data warehousing environments for OLAP queries involving aggregations over large fact tables. For low-selectivity queries that return a high percentage of rows (e.g., over 60%), full table scans avoid the repeated block accesses inherent in index traversal, reducing logical I/O significantly. Benchmarks demonstrate these benefits quantitatively; for instance, in tests on datasets exceeding capacity, full table scans completed in approximately 4 seconds compared to 30 seconds for full index scans, yielding up to 7.5x faster performance due to efficiencies.

Disadvantages

Full table scans impose significant resource demands, primarily through elevated I/O operations and consumption, as the database must sequentially read every row and block in the table regardless of query selectivity. For large tables, such as those exceeding 1 TB in size, this results in processing the entire dataset, leading to prolonged execution times and substantial disk throughput usage. In row-oriented relational databases, the process exacerbates pressure by loading complete rows into buffer cache, even when only a of columns is required, potentially causing cache evictions and reduced hit rates for other queries. Unpartitioned tables suffering from bloat—due to fragmentation or deleted rows—further inflate the scanned volume, amplifying I/O costs without proportional benefit. Scalability challenges arise prominently with full table scans, as their degrades linearly with table growth, making them unsuitable for high-selectivity queries that target few rows amid vast sets, such as identifying one record in a million-row table. The of scanning all persists irrespective of the output size, rendering the operation inefficient for selective access patterns. In multi-user environments, these scans intensify , consuming CPU and I/O bandwidth that could otherwise support concurrent transactions, and may prolong shared locks on the table, hindering parallelism. In contemporary cloud-based systems, full table scans compound expenses beyond local resources, as they trigger network data transfers and incur billing based on scanned bytes, significantly elevating operational costs for distributed queries. For instance, platforms like Google BigQuery charge directly for the volume of data processed during scans, turning a 1 TB full table scan into a major financial burden. Furthermore, in columnar databases optimized for analytics, full table scans prove less efficient when retrieving all columns, since data is stored contiguously by column rather than row, necessitating multiple disjoint reads to assemble complete records and increasing overall latency.

Practical Applications

Basic Examples

A full table scan occurs in a simple SELECT query when the WHERE clause filters on an unindexed column, requiring the database management system (DBMS) to examine every row in the table. For example, consider the query SELECT * FROM employees WHERE salary > 50000; executed on a table lacking an index on the salary column. In , the EXPLAIN output for this query would indicate a full table scan with type: ALL, estimating the scan of all rows (e.g., 100,000 rows) before applying the filter, potentially returning a subset like 30,000 rows. Similarly, in , the EXPLAIN output shows a Seq Scan on the employees table, with a such as cost=0.00..500.00 rows=30000 width=100 and a filter (salary > 50000), scanning all rows (e.g., 100,000) to evaluate the condition. Aggregation queries like SELECT COUNT(*) FROM orders; inherently perform a full table scan to count all rows, as no index can optimize the total without additional constraints. In MySQL, the EXPLAIN plan displays type: ALL for the orders table, scanning every row (e.g., 50,000 rows) to compute the aggregate, with an estimated cost reflecting the full traversal. In PostgreSQL, the plan reveals an Aggregate node over a Seq Scan on orders, such as Seq Scan on orders (cost=0.00..18334.00 rows=50000 width=0), confirming the need to visibility-check all rows due to multiversion concurrency control (MVCC). These examples illustrate basic scenarios where full table scans are the default access method for unoptimized queries on single tables.

Advanced Scenarios

In data warehouse environments, full table scans (FTS) often play a critical role in join operations involving large fact tables, particularly when using hash joins to combine data from dimension tables. For instance, in star schemas, the optimizer may select an FTS on the fact table to retrieve a broad set of rows before building a hash table for subsequent joins, as this approach leverages sequential I/O efficiency when selectivity is low or bitmap indexes cannot sufficiently filter data. This is common in analytical queries where full outer joins are required to include all records from the fact table, even those without matches in dimension tables, ensuring comprehensive aggregation without missing data in reporting scenarios. For partitioned tables, FTS can be optimized through parallelism and techniques to handle massive datasets efficiently. Parallel FTS divides the scan across multiple processes using granules—such as block ranges or entire partitions—as work units, allowing the degree of parallelism to scale based on available resources rather than partition count alone, which enhances throughput in queries. Dynamic partition pruning further refines this by eliminating irrelevant partitions at runtime, based on predicates involving bind variables or subqueries, thereby reducing I/O and processing time compared to a full unpruned scan. In implementations, this results in execution plans where only pertinent partitions (denoted by PSTART and PSTOP) are accessed, making FTS viable for terabyte-scale tables in OLAP workloads. Tuning FTS is essential for bulk operations like ETL processes, where forcing a scan can bypass index overhead for better on large volumes. The Oracle FULL hint, specified as /*+ FULL(table_alias) */, explicitly directs the optimizer to perform an FTS on the targeted table, overriding index-based plans when is more efficient for extracting or transforming entire datasets. This technique is particularly effective in ETL pipelines involving inserts, updates, or deletes across full tables, as it minimizes random I/O and leverages multiblock reads to accelerate data movement in loading routines.

Alternatives and Comparisons

Index-Based Access

Index-based access serves as the primary alternative to full table scans in relational databases, leveraging indexes to efficiently locate and retrieve specific rows without examining the entire table. This method relies on data structures such as B-trees or hash tables to map key values to row identifiers (ROWIDs), enabling targeted data access. By traversing the index structure, the identifies qualifying rows and then fetches the corresponding data from the table, reducing I/O operations compared to sequential reads. Common types of index scans include range scans, unique lookups, and full index scans. Range scans, typically performed on indexes, traverse the ordered structure to access a contiguous set of key values within specified bounds, such as in queries using greater-than, less-than, or BETWEEN operators; this involves navigating from the to the nodes and scanning sequential index entries to collect ROWIDs. Unique lookups, supported by both and hash indexes, target a single key value for exact matches, with B-trees allowing efficient point queries via logarithmic traversal and hash indexes using a to directly compute the storage location for equality predicates. Full index scans read the entire index in order, useful when the index covers all needed columns and an ORDER BY clause is present, avoiding additional sorting steps. In all cases, after obtaining ROWIDs from the index, the database performs row fetches from the table, which may involve random I/O if rows are not clustered. Index-based access is preferred for high-selectivity queries that retrieve a small fraction of rows, where the overhead of index traversal is offset by avoiding unnecessary reads. The optimizer estimates using approximations like the total I/O for index blocks accessed plus the blocks containing matching table rows, multiplied by the I/O per block; for example, in a range scan, this includes block scans and subsequent table accesses. This contrasts with full table scans, which become the fallback when no suitable index exists or selectivity is low. Despite these benefits, index-based access introduces limitations, including storage overhead for maintaining index structures and increased update costs during inserts, deletes, or modifications, as indexes must be synchronized, potentially doubling write operations. Large indexes can also lead to deeper tree traversals, with height growing logarithmically (e.g., from 3 levels for millions of rows to 4-5 for billions), increasing seek times and CPU usage. Hash indexes, while efficient for lookups, lack support for range queries and may suffer from collisions in high-load scenarios, further limiting their applicability.

Other Retrieval Methods

Bitmap index scans provide an efficient alternative to full table scans by leveraging bitmapped representations of for columns with low , particularly in data warehousing environments. In this approach, each distinct value in a low-cardinality column—such as or , where the number of unique values is small relative to the total rows—is associated with a , a compact bit vector where each bit corresponds to a row in the table, indicating presence (1) or absence (0) of that value. For queries involving multiple predicates, the database performs bitwise operations on these bitmaps: AND operations intersect bitmaps for conjunctive conditions (e.g., filtering rows matching both predicates), while OR operations union them for disjunctive conditions, generating a composite bitmap that identifies qualifying rows before accessing the actual table . This method excels in low-cardinality scenarios because bitmaps are highly compressible and allow rapid filtering with minimal I/O, often outperforming indexes for ad-hoc queries on fact tables in data warehouses, as demonstrated in evaluations showing up to 50% reduction in bitmap operations through optimized algorithms. Materialized view scans offer another retrieval path by directly querying precomputed results stored as physical tables, thereby bypassing full scans on underlying base tables for complex aggregations or joins. A captures the output of a query—such as sums or grouped from large fact and dimension tables—and persists it, enabling subsequent scans solely on this optimized structure rather than recomputing from raw each time. This approach is particularly beneficial for read-heavy workloads, as the precomputed reduces query latency; for instance, in dedicated SQL pools, materialized views maintain like tables, avoiding repeated expensive computations and full table accesses on source relations. While scanning a may involve a full scan of the view itself (treated as a temporary or auxiliary table), the overall I/O is minimized since the view is typically smaller and tailored to common query patterns, with automatic refresh mechanisms ensuring consistency in systems like or . In columnar databases, emerging methods like zone maps and min-max enable selective by skipping irrelevant chunks during scans, enhancing efficiency over traditional full table scans on wide tables. Zone maps store metadata, such as minimum and maximum values for columns within fixed-size granules (e.g., 1MB blocks or 8192 rows), allowing the query to prune entire zones that fall outside predicate ranges without reading their contents. For example, in ClickHouse's MergeTree family, the minmax skip index computes min-max bounds per granule, enabling rapid elimination of non-qualifying for range filters (e.g., age > 40), which significantly reduces scanned volume in analytical queries on trillions of rows. These techniques leverage the columnar format's inherent compression and vectorized processing, providing faster than row-oriented indexes while integrating with sorting for further locality improvements.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.