Sequential Scans and the Cache Manager
Description
This feature modifies the retention of database pages which are read into the page buffer cache as the result of a large sequential scan. A sequential scan is an access method in which every page of some logical collection is read one after the other in successive order. Examples of a sequential scan or activities which induce a sequential scan include:
- Database backup
- Database sweep
- Database shadow creation
- Index creation
- Table scans due to query evaluation
- Fetching large segmented BLObs
The goal of this work is to minimize the impact of these sequential operations on transactions which have some user-defined response time requirement (e.g. OLTP.) from the cache manger's perspective.
User interface/Usability
This feature is not a user-visible feature but, nevertheless, has an impact on user visible performance. An important observation in the justification for this feature is recognizing that the database page buffer cache is shared among all the users of a database in the Superserver architecture. As such, at any instant, its content represents the cumulative collection of individual database page working sets for each of the database's clients. Each client has a conceptual working set of database pages that is accessed during the evaluation of their current query.
There is usually a set of database pages that are commonly referenced across many of these individual working sets. For InterBase, this includes high-level relation pointer pages, index pages and transaction inventory pages which are rapidly referenced to efficiently locate rows, enforce constraints and manage the life cycle of a transaction, respectively.
To insure fair and efficient use of the page buffer cache, it is important that no single user or process unduly monopolize a disproportionate share of the cache. The "large" sequential scan, if managed unwisely, is an operation that can dominate the contents of the page buffer cache and disrupt the performance profile of every other concurrent user transaction.
Requirements and Constraints
A "large" sequential scan is defined as a scan where the count of pages to be retrieved is greater than the number of page buffers in the cache. This is an arbitrary definition but one which has certain properties and conservative underpinnings which will minimize this feature's effect on existing applications.
The current behavior of a large sequential scan reads each page as the most recently used (MRU) page. This has the effect of replacing every resident page with pages from the sequential scan even though once the last item from each sequential page is accessed it will never be accessed again by this operation.
It is proposed that this behavior of sequential scans be amended such that each page of a large sequential scan be queued as the Least-Recently-Used (LRU) page so that the page buffer it occupies can be reused by the scan or any other user for that matter.
"Small" sequential scans where the number of pages to be read in less than the size of the page buffer cache will continue to replace existing pages in the page buffer cache. Since InterBase has no formal mechanism to allow users to specify that a table be permanently cached, this policy will allow small frequently accessed lookup tables to stay cached as long as the dynamics of page accesses keeps those pages resident. Thus, users can continue to hope that pages will be kept resident by assigning a page buffer cache whose size is larger than some frequently accessed table or BLOb.
This policy also prevents the introduction of a performance anomaly related to nested loop joins. If a "small" table representing the inner join term was not cached then every row from the outer "large" table would cause the "small" table to be re-read from disk (discounting for the favorable effects of secondary file caches.) This isn't a common scenario but when it occurs would cause deliterious amounts of disk I/O.
Returning to the large sequential scan, there are scenarios where the proposed LRU queuing of the current page has to be deferred:
- One of the rows on a data page is a candidate for garbage collection. InterBase V6 user transactions no longer garbage collect, instead, a dedicated garbage collector thread performs this task in the background. In this case, the sequential scan will leave the page so that the garbage collector isn't forced to re-read the page. When the garbage collector finishes with the page, it will LRU queue the data page, which will probably trigger the cache writer to write the dirty page.
- The page was already in the page buffer cache as a member of some user's database page working set. If the sequential scan were to bump resident pages from cache, it would defeat the very goal this proposal is trying to achieve.
- Synchronized sequential scans may be executing concurrently. This could be from multiple user queries or a future index creation enhancement that creates multiple indexes in a single table scan as available temporary disk space will allow. Thus, a relation scan count will be maintained when sequential streams are opened and closed and used to reference count a buffer. As each sequential scan visits the page, it decrements the buffer scan count and releases the page buffer to the LRU queue when the count reaches zero.
Migration issues
The introduction of SQL scrollable cursors may require some of these algorithms to be modified. Once a user can scroll in both directions, it will be desirable to keep some "window" of previously visited pages from a sequential scan in the page buffer cache.
This isn't a migration issue for the user but a performance issue for InterBase engineering when SQL scrollable cursors is surfaced someday to the user.