Specifying Query Access Plans in the InterBase Optimizer
Various developers have requested the ability to view the access plan being used by InterBase in query processing, as well as to modify that plan when they believe they can improve on it. Here are the requirements for setting and getting the access plan in the optimizer:
- Provide a readable display of the access plan generated by the optimizer for any given query (ISQL only).
- Provide extended SQL syntax to specify the plan to be used in a query. (DSQL/ISQL and embedded SQL).
- Make the output of requirement 1 acceptable as input to requirement 2, so that the user may simply modify the plan rather than generating it from scratch.
Limitations
If the user specifies the plan to be used in a query, we will make no attempt to analyze the submitted plan for errors of omission. That is, the optimizer is either on or off; it will not attempt to complete a partial plan or make recommendations for doing so. Of course, if the query cannot be accomplished via the submitted plan an error will be reported.
Elements of Plan Selection
To determine the options needed in setting a plan, it is important to survey the methods used by the optimizer to generate a plan. In this discussion, the term stream refers generically to a set of records, such as a relation or a relation with restrictions placed on it.
Assigning Indices
InterBase uses indices for four basic purposes:
- Equalities and Inequalities: When a field is compared to a constant, andthe field is indexed, the index can be used to retrieve values greater than, less than, or equal to the constant. Utilizing the index rather than a sequential scan avoids fetching records that don't meet the selection criteria.
- Matching joined streams: When two streams are joined by matching two fields, an index can be used to perform the join. If an index is available for one of the fields, the other stream is retrieved first and key values matched against the index.
- Sorting and grouping: When a query is sorted by a field or fields which are the keys of an index, the index is used to retrieve the records in order rather than performing a sort. This is called a "navigational" walk of the index. It is not navigational in the sense that the index is walked backwards, but in that no bitmap is generated for the stream.
- Navigation: Indices are used to walk forwards and backwards through a stream in key order. This is used to support IDAPI-style navigation. (This functionality does not currently surface through SQL.)
Combining Indices
For each selection criterion which matches an index, InterBase generates a bitmap of the selected records. It can then combine the bitmaps by ANDing and ORing them together according to the logic of the query. This generates a single bitmap which describes all the selected records from a single stream. This is advantageous when iterating through a stream multiple times, since it is much faster to scan the bitmap than a set of indices.
Determining Join Order
In joining two or more streams together, it is often crucial to determine which stream should be retrieved first. In general, the stream which is the most expensive to retrieve should be retrieved first. If you think of a join as a set of nested loops, it makes sense that the innermost loop will be executed the most times. Similarly, the last stream in the join list will be fetched the most times, so it had better be the cheapest to retrieve.
Generating Rivers
When two or more streams are combined together, they form a river. The optimizer first attempts to find the longest river which can be retrieved via indices alone. Then it attempts to form the remaining streams into the longest river, and so on. Once it has a set of rivers which are retrievable via indexed access, it joins them together in order from left to right based on the estimated cost of retrieving each river.
Cost Estimation
The cost of retrieval of a stream depends on several variables: whether there is an index, the selectivity of that index, whether there are selection criteria, the cardinality of the underlying relation, and whether the stream needs to be sorted. The optimizer uses all these variables to determine the cost of a retrieval of a stream. For example, when retrieving an indexed equality, the cost is estimated as: cardinality of base relation * selectivity of index + overhead of index retrieval Since rough estimation is used to determine the values of all these variables, and the relative importance of each of these variables cannot be precisely determined for all queries, the optimizer can often select an inappropriate plan. The user may simply wish to determine a plan based on empirical data for a particular query. But we should advise the customer that the best plan may change over time as the data changes, so specifying a plan as part of the query implies an added maintenance burden for the SQL programmer.
Sort Merges
When a join term is specified between two streams, and no index is available, the optimizer specifies a sort merge between the two streams. This means that both streams will be sorted and the results merged together. Once the streams are sorted, the engine can scan through both streams just once to form the join, rather than iterating through the second stream for every record in the first stream.
Set Plan Syntax
Syntax has been added to the SELECT expression in GPRE and DSQL/ISQL to allow the user to set the plan for a query. The syntax should work for SELECT statements used in creating a view, a stored procedure, or a trigger. It should also work for subselects in an UPDATE or DELETE statement. Syntax to specify a plan for UPDATE or DELETE is also an option, though it is not planned for now.
The Select Statement
A PLAN clause has been added to the SELECT expression. The syntax supports all current optimizer options, and is extensible enough to allow for new options. An abridged syntax for the SELECT statement is:
select_expression [ UNION select_expression ... ] [ ORDER BY field_list ] Select_expression := SELECT [ DISTINCT | ALL ] [ field_list | '*' ] FROM relation_list [ WHERE search_condition ] [ GROUP BY field_list ] [ HAVING search condition ] [ PLAN plan_expression ]
Plans can be specified independently for the query and any subqueries. Specifying a plan on the main select or any subselect does not require a plan to be specified on any of the other select expressions in the query.
The Plan Expression
The syntax for the plan expression is as follows:
plan_expression := [join_type] '(' plan_item_list ')' plan_item_list := plan_item | plan_item ',' plan_item_list plan_item := table_specifier access_type | plan_expression
This syntax allows specification of a single relation or a join of two or more relations in one pass. Nested parentheses can be used to specify any combination of joins. Reverse polish notation is used to specify the join operator, mostly because an infix operator would confuse the parser.
join_type :=JOIN | [SORT] MERGE
The default join type is a JOIN. The keyword MERGE can be used to specify a sort merge of the substreams. The optional SORT keyword can be used for clarification purposes.
table_specifier :={ table_name | alias_name } [ table_specifier ]
The table name or the alias for the table can be specified here. If a table is used more than once in the query, aliases must be used to distinguish them in the plan. More than one table specifier can be used to specify a base table of a view (see "Handling Views", below).
access_type :=[ NATURAL | INDEX '(' index_list ')' | ORDER index_name ] index_list :=index_name | index_name ',' index_list
The default access is NATURAL order, which means sequentially accessing the records in no defined order.
The INDEX keyword allows specification of one or more indices for satisfying booleans and join terms in the query. The user must specify all the indices to be used; if any booleans or join terms remain after all indices are exhausted, they will be evaluated without benefit of an index. If any indices are specified which cannot be utilized, an error will be returned.
The ORDER keyword specifies that a sort will be effected through the use of the index. In order to cause the result of the query to be sorted, the stream must be placed leftmost in the join order. If the resultant stream is not sorted as specified by the ORDER clause, a physical sort will be performed.
Display Plan Syntax
The following command is now available in ISQL (QLI for V3.2L):
ISQL> SET PLAN;
Once this option has been set, all select-type statements will display the plan chosen by the optimizer for the provided query, as shown in the examples below. The SET PLAN command toggles this feature on and off, just as with other ISQL options.
Examples of Plan Syntax
These examples demonstrate the plans that ISQL would print for some sample queries. The plan clause itself can then be placed at the end of the query, guaranteeing that the plan will always be used by the query in retrieving the result set. Optionally, the user may use the plan clause as a starting point to make modifications to the plan.
- Simple retrieval.
SELECT * FROM CITIES; PLAN (CITIES NATURAL)
- Ordered retrieval, utilizing an index for ordering.
SELECT * FROM CITIES ORDER BY CITY;
PLAN (CITIES ORDER CITIES_1)
- Simple join, resulting in a cross product.
SELECT * FROM CITIES C, STATES S;
PLAN JOIN (S NATURAL, C NATURAL)
Note
When aliases are specified in the query, they are also used in the plan printed by the optimizer, to avoid confusion when the same table is referenced twice. The user may use either aliases or table names as long as the table is used only once in the select expression.
- Join with indexed equality.
SELECT * FROM CITIES C, STATES S WHERE C.STATE=S.STATE;
PLAN JOIN (S NATURAL, C INDEX (DUPE_CITY))
- Equijoin with index used for ordering.
SELECT * FROM CITIES C, STATES S WHERE
C.STATE=S.STATE ORDER BY S.STATE;
PLAN JOIN (S ORDER STATE1, C INDEX (DUPE_CITY))
- Equijoin with no available indices, resulting in a sort merge.
DROP INDEX STATE1;
DROP INDEX DUPE_CITY;
SELECT *
FROM CITIES C, STATES S WHERE C.STATE=S.STATE;
PLAN SORT MERGE (C NATURAL, S NATURAL)
- Three-way join with two indexed equalities.
SELECT * FROM CITIES C, STATES S, MAYORS M
WHERE C.CITY=M.CITY AND C.STATE=S.STATE;
PLAN JOIN (S NATURAL, C INDEX (DUPE_CITY), M INDEX (MAYORS_1));
- Three-way join with only one indexed equality, resulting in a sort merge of the result.
SELECT * FROM CITIES C, STATES S, MAYORS M
WHERE C.STATE=M.STATE AND C.STATE=S.STATE;
PLAN SORT MERGE (M NATURAL, JOIN (S NATURAL, C INDEX (DUPE_CITY)));
Handling Views
Views may present some difficulty for users of the PLAN feature. Ordinarily, users may treat a view the same as a table. But in optimizing a query, the user must be cognizant of the base table(s) participating in the view.
Expanding Views
The PLAN clause must treat a view reference as if the base tables used in creating the view were inserted into the FROM list of the query. For example, assume a view is created as:
CREATE VIEW CITIES_VIEW AS
SELECT * FROM CITIES C, STATES S WHERE C.STATE=S.STATE;
If that view were referenced in a simple query, the resultant plan would be printed as follows:
SELECT * FROM CITIES_VIEW;
PLAN JOIN (CITIES_VIEW S NATURAL, CITIES_VIEW C INDEX (DUPE_CITY))
To retrieve a view, a join is formed which retrieves the base tables in the view. The join is executed according to the logic of the query which defines the view. Thus the PLAN writer must be familiar with the query which defines the view, and tell the optimizer how to retrieve the base tables in the view.
Specifying a Base Table
When printing a plan for a query which involves a view, ISQL will use multiple aliases to completely specify the base tables of the view. This is to avoid confusion when a table is referenced more than once in the query. The user may abbreviate the syntax to specify only the table name, or only the alias for that table name. Thus any of the following alternative syntaxes are also legal for the view defined above:
PLAN JOIN (STATES NATURAL, CITIES INDEX (DUPE_CITY)) PLAN JOIN (S NATURAL, C INDEX (DUPE_CITY)) PLAN JOIN (CITIES_VIEW STATES NATURAL, CITIES_VIEW CITIES INDEX (DUPE_CITY))
Passing the Plan to the Engine
DSQL and GPRE generate new BLR to pass an access plan to the engine. The BLR takes the form of a new plan clause on the record selection expression:
blr_rse count stream_clause... [first_clause] [boolean] [sort_clause] [project_clause] [plan_clause] stream_clause := table_name | aggregate_expression | union_expression table_name := blr_relation relation_name context_variable | blr_rid relation_id context_variable
Where the plan clause is defined as:
plan_clause := blr_plan plan_expression plan_expression := {blr_join | blr_merge} count plan_expression... | blr_retrieve context_variable access_type
The context variable in the blr_retrieve must match a context variable from the stream_clause. There must be a blr_retrieve for each stream in the stream_clause.
access_type := blr_sequential | blr_navigational index_name | blr_indices count index_name...