Feature Description
You should recognize that optimization uses heuristics, and it is possible that any particular join may be slower in this version. In such a case, the PLAN syntax can be used to specify the access plan that was chosen in versions prior to 5.0. The following changes have been made to the optimizer functionality in 5.0.
The DISTINCT operator has been optimized to use an index where possible.
A bug in the estimation of cardinality of large tables has been fixed.
Ordering of multi-table joins has been improved with more accurate cost estimation techniques.
Redundant sorting removed by optimizer.
View with DISTINCT will now use an index.
When a view is created with a DISTINCT clause and that view is used in a query
with a WHERE clause, items in the WHERE clause were not mapped to an index.
This query will now be mapped to an index on the underlying table, if it
exists.
For Example:
CREATE VIEW EMP_VIEW
AS SELECT DISTINCT LAST_NAME FROM EMPLOYEE;
SELECT * FROM EMP_VIEW WHERE LAST_NAME='Clinton';
The query will make use of the index NAMEX on LAST_NAME.
DISTINCT is optimized to use an index. Prior to this release, the use of DISTINCT always invoked a sort. The DISTINCT operation will now be mapped to an index where possible.
For Example:
SELECT DISTINCT EMP_NO FROM EMPLOYEE
The plan returned will be:
PLAN (EMPLOYEE ORDER RDB$PRIMARY7)
Indicating that the DISTINCT will be performed by fetching the records in order
via the index RDB$PRIMARY7, and then filtering out the duplicates.
Redundant sorting is now removed by the optimizer. When DISTINCT, GROUP BY and ORDER BY operators are used against the same fields the optimizer removes redundant sorting. In particular, when these three clauses are used together against an indexed column(s), all three clauses will be mapped to the index and no sort will be necessary.
For Example:
SELECT DISTINCT LAST_NAME FROM EMPLOYEE GROUP BY LAST_NAME ORDER BY LAST_NAME;
The plan is returned as follows:
PLAN (EMPLOYEE ORDER NAMEX)
Indicating that all three clauses will be handled in one pass via the index
NAMEX. No sorts will be performed.
DISTINCT mapped to base table. When a DISTINCT is specified on a JOIN, the DISTINCT is applied to one of the base tables before the JOIN is performed. The net result is a much more efficient use of resources.
For Example:
SELECT DISTINCT E.DEPT_NO FROM EMPLOYEE E, DEPARTMENT D
WHERE E.EMP_NO=D.EMP_NO;
The plan is returned as follows:
PLAN JOIN (D ORDER RDB$PRIMARY5,E INDEX (RDB$FOREIGN8))
The Bug regarding Large PLAN not being displayed by ISQL is now fixed. Previously, a PLAN longer than 256 bytes would not be printed from ISQL. The buffer has been increased to 1024 bytes so that most bytes can be printed. If the user generates a query with a resultant PLAN longer than 1024 bytes, they can make use of the isc_dsql_sql_info() call to retrieve it.
Optimizing MIN and MAX to use indices. MIN() and MAX() operations
make use of indices when available. The MIN() operation has been optimized to
make use of an ascending index when available. The MAX() operation has been
optimized to make use of a descending index when available. For the query:
SELECT MIN (LAST_NAME) FROM EMPLOYEE;
The plan will return :
PLAN (EMPLOYEE ORDER NAMEX)
Indicating that the first user-visible record will be fetched from the index
NAMEX.
Adding field for optimizing aggregate operations.For now, MIN will only use an ascending index and MAX will only use a descending index. However, this change will automatically allow a MAX to be mapped to an ascending index and a MIN to be mapped to a descending index when scrollable cursors are turned on.
Adding sort information to plan output. Added SORT information to
PLAN report. The SET PLAN feature will report when a sort is being performed.
For example, with the query:
SET PLAN;
SELECT LAST_NAME FROM EMPLOYEE ORDER BY FIRST_NAME;
The following plan will be reported:
PLAN SORT ((EMPLOYEE NATURAL));
Indicating that the table will be fetched in natural order, then the results
will be sorted. When specified as part of the PLAN clause to a query, the SORT
modifier will be ignored. So, for example:
SELECT LAST_NAME FROM EMPLOYEE PLAN
SORT ((EMPLOYEE NATURAL));
will be accepted, but will not force a sort of the stream when it is not called
for by the query.
This info will help the customer (and us) to determine where a sort is being
performed in an access plan. This is a very important aspect of the performance
of a query, which was previously not being displayed.
On the input side, we will ignore the SORT keyword, and will not require it to
be there in order to induce a sort. This will allow us to be
backward-compatible with plans in V4.0. It will also prevent us from creating a
query with wrong results, since a sort is required semantically.
The analysis of join orders was revised. So that all possible orders
will be analyzed up to 5 tables, and analysis of orderings for more than 5
tables is much faster and more accurate. Customers in the past have complained
that access plans may vary depending on the order in which tables are
specified. This change prevents that effect, and should result in the best plan
being chosen more often.
Optimization of 6+ table joins is much faster than previously, and should
usually result in a better plan being chosen.
Customers should recognize that optimization uses heuristics, and it is
possible that a particular join will be slower in this version than previously.
In such a case, the PLAN syntax can be used to specify the access plan that was
chosen in versions prior to 4.1.
IN clause affects plan generated. This was a customer-reported problem in which adding an item to an "IN" clause caused a different plan to be selected, resulting in 400x slower performance. This was fixed as part of join order improvements.
Fixed PLAN when used with OR in WHERE clause. The PLAN clause had no
effect when an OR was specified as part of the WHERE clause. For example:
CREATE INDEX EMP_INDEX ON EMPLOYEE (EMP_NO);
SELECT * FROM EMPLOYEE WHERE EMP_NO=1 OR EMP_NO=2 PLAN (EMPLOYEE INDEX
(EMP_NO));
This plan would result in bogus errors prior to 4.1.
DISTINCT operator now returns correct results. Fixed a bug with certain cases where the DISTINCT operator would return more than the correct number of records. A few values would be returned more than once.
SQL92 JOIN syntax has been optimized for INNER JOINs. The optimizer
can now detect when more than one inner JOIN has been specified and create a
PLAN based on the best order available. The inner JOIN syntax for SQL92 is now
functionally equivalent to the old style comma list syntax. For example:
: SELECT * FROM PROJECT P JOIN EMPLOYEE E
ON P.TEAM_LEADER=E.EMP_NO
JOIN DEPARTMENT D ON E.DEPT_NO=D.DEPT_NO;
Will now have the same performance and generate the same access plan as:
SELECT * FROM PROJECT P, EMPLOYEE E, DEPARTMENT D
WHERE P.TEAM_LEADER=E.EMP_NO
AND E.DEPT_NO=D.DEPT_NO;
DISTINCT with ORDER BY was in wrong order. The DISTINCT clause might cause records to be returned in the wrong order prior to 4.1, when no index is available to do the sort. This case has now been fixed.
Choose minimal index set. Prior to 5.0, if there were multiple indices that could improve a query, ALL of them were used - even if they were duplicate indices. Under 5.0 only one index is used.
Optimizing sorting for GROUP BY with DISTINCT or ORDER BY. Optimized
out the extra sort when a DISTINCT or an ORDER BY is applied to a stream for
which a GROUP BY has already been done on a superset of the same fields. This
allows, for example, the following query to be performed without sorting if an
index is available:
SELECT DISTINCT f1 FROM t1 GROUP BY f1 ORDER BY f1
Note also that BDE makes extensive use of DISTINCT, GROUP BY, and ORDER BY, and
generates extra ORDER BYs when DISTINCT and GROUP BY are used.
In V4.0, up to three separate sorts would be performed for this query. Now at
most one will be performed (if an index is not available). This was made tricky
by the fact that the GROUP BY causes an aggregate rse to be generated as a
sub-rse. A new routine, map_equal(), checks whether two fields are equivalent
across the mapping from the rse to the aggregate rse. This allows us to check
that the fields in the DISTINCT list or the ORDER BY list are the same as those
in the GROUP BY list.