Exact Numerics - Functional Specification

Description

Bring our implementation of exact numeric datatypes, especially NUMERIC and DECIMAL, into conformance with the SQL standard, by implementing 10 to 18 decimal digits of precision using 64-bit integers (public type ISC_INT64), rather than 10 to 15 digits using double precision as in InterBase V5.x and earlier.

Note that our handling of "approximate numeric" types is also incorrect according to the SQL standard. This specification does not address that problem, which should be dealt with elsewhere. The error is that the type which we call FLOAT, a 32-bit single-precision floating point datum, is supposed to be called REAL: according to the standard, a FLOAT declaration is followed by a precision declaration; a FLOAT may wind up being implemented the same as either REAL or DOUBLE, depending on the declared precision.

User Interface/Usability

Declarations

Columns declared as NUMERIC(n,m) or DECIMAL(n,m), where 10<=n<=18, are stored as 64-bit integers (ISC_INT64), so that the pennies come out exactly right in accounting applications. This matches the way those types should behave according to SQL92, unlike the approximate (floating-point) implementation which InterBase used for 10 to 15 digits up to V5.0 NUMERIC or DECIMAL columns with precision between 1 and 9 continue to be represented in 16- or 32-bit integers depending on the declared precision, as in V5.x

Up to release V5.0, a declaration such as NUMERIC(25,2) was accepted, and was treated the same is NUMERIC(15,2). In 6.0, a NUMERIC or DECIMAL declaration requesting precision greater than 18 digits is rejected with error isc_precision_err, which formerly read "precision should be greater than 0", but is now altered to "precision must be from 1 to 18". Declarations requesting 1 to 9 digits of precision have the same effect as in prior versions.

The 64-bit-integer types described above are available in all contexts where datatypes are defined or used, including the following :

  • CREATE TABLE
  • ALTER TABLE ADD <column>
  • CREATE DOMAIN
  • Use a domain so defined to define a column with CREATE TABLE or ALTER TABLE ADD <column>
  • Arguments to, and returned values from, stored procedures
  • Variables in stored procedures and triggers
  • Arguments to, and returned values from, UDFs
  • Arrays
  • Unique or non-unique indices, primary keys, and foreign keys

FUNCTIONS and ARITHMETIC

MIN, MAX, SUM, AVG, and CAST work with all the exact numeric types. SUM and AVG are exact if the subject row has an exact numeric type and the scaled sum fits in 64 bits: otherwise we raise an overflow exception. SUM and AVG are never computed internally using floating point arithmetic unless the data type of the column is approximate numeric. This is a change from V5.x, which used DOUBLE for AVG on any kind of column.

MIN and MAX on an exact numeric column return an exact numeric result having the same precision and scale as the column. SUM and AVG on an exact numeric column return type NUMERIC(18,S) or DECIMAL(18,S), where S is the scale of the column. (The SQL standard specifies the scale of the result in such cases, while the precision of SUM or AVG on an exact-numeric column is implementation-defined: we define it as 18.)

If two operands OP1 and OP2 are exact numeric with scale S1 and S2 respectively, then OP1+OP2 and OP1-OP2 are exact numeric with precision 18 and scale the larger of S1 and S2, while OP1*OP2 and OP1/OP2 are exact numeric with precision 18 and scale S1+S2. (The scales of these operation except division are specified by the SQL standard. The standard makes the precision of all these operations, and the scale of divison, implementation-defined: we define the precision as 18, and the scale of division as S1+S2, the same as is required by the standard in the case of multiplication.) This means that many multiplication operations involving [NUMERIC|DECIMAL](9,m) datatypes, which caused arithmetic overflow error messages in V5.x, will return correct and meaningful 64-bit results in V6.0. This should be tested.

Whenever an arithmetic operation on exact numeric types overflows, V6 reports an overflow error, rather than returning an incorrect value. As a example which may not be obvious to the reader, if a DECIMAL(18,4) column contains the most-negative value of that type, -922337203685477.5808, an attempt to divide that column by -1 will report an overflow error, since the true result exceeds the largest positive value which can be represented in the type, namely 922337203685477.5807.

If one operand is exact numeric and the other is approximate numeric, then the result of any of the 4 dyadic operators is DOUBLE PRECISION. (The standard says that the result is approximate numeric with precision at least as great as the approximate numeric operand: we satisfy that requirement by always using DOUBLE, which is the largest-precision approximate numeric type we provide.)

NUMERIC INPUT

Any numeric string in DSQL or ISQL whose internal value can be stored exactly in DECIMAL(18,m) for some m <=18 is evaluated exactly, without any loss of precision such as might result from internal intermediate storage in a DOUBLE. A numeric string is recognized by the DSQL parser as a floating-point value only if it contains an "e" or "E" followed by an exponent, which may be zero. That is, DSQL recognize 4.21 as a scaled integer, and passes it to the engine in that form. The engine will then convert it if necessary into the destination type, but the value is treated exactly up to that time. On the other hand, DSQL recognizes 4.21E0 as a floating-point value. This may affect the result type of an expression involving such numeric literals.

Any value which can be stored in a DECIMAL(18,n) can also be specified as the default value for a column or domain.

Any value which can be stored in a DECIMAL(18,0) column can be supplied as the value on a SET GENERATOR statement.

Note that some but not all 19-digit numbers can be stored and retrieved in INT64s: we should test the most positive and most negative values, +9223372036854775807 and -9223372036854775808, respectively, as values inserted into a database, as the default value for a domain or a column, as a compared value in a WHERE clause, and as the value in a SET GENERATOR statement, and similar numbers with internal decimal points all the above mentioned contexts except SET GENERATOR.

DISPLAYED OUTPUT

Any value stored in a NUMERIC or DECIMAL column is printed exactly by (w)isql.

As with numeric input, we should test for correct printing of the largest possible positive and negative values of exact types with various scales specified

Floating-point values are printed in ISQL and converted to text types in the engine in a consistent way. If the input is a DOUBLE PRECISION value with negative scale, that is, a InterBase Version 5.x NUMERIC(15,n) for some value of n, the column is printed or converted using the %f format specification of the C language's printf() family of subroutines, with the number of digits to the right of the decimal point equal to n. (In V5, ISQL printed DOUBLE PRECISION values with scale in this way, but the engine did not pay attention to the scale when carrying out a "CAST(value AS CHAR(n))" operation.) In all other cases, both ISQL and the engine use a %g format: they never emit more than 8 significant digits when formatting a FLOAT value, or 16 when formatting a DOUBLE PRECISION value. When the engine converts an approximate numeric value to a CHAR type which is longer than needed to store the maximum number of digits, the output is padded on the right with blanks. (This differs from V5.x, which would cast a DOUBLE PRECISION value to a long text type by printing dozens or hundreds of digits, even though a DOUBLE PRECISION column holds between 15 and 16 decimal digits of precision.) The printing or conversion of approximate types does not suppress trailing zeroes, nor does it suppress a decimal point which is followed only by zeroes: the reason is that a floating-point (and hence necessarily approximate) value should never look like an integer.

In isql's SHOW commands, such as SHOW DOMAINS and SHOW TABLE <tablename>, an exact numeric type in a V6 database will be shown as it was declared. When isql connects to a V5.x or older server, it will continue to report results as in the past.

GENERATORS

Generators return a 64-bit value, and only wrap around after 2**64 invocations (assuming an increment of 1), rather than after 2**32 as in V5.x. Documentation should advise the user to use an ISC_INT64 variable to hold the value returned by a generator. (A client using dialect 1 will only receive the least significant 32 bits of the updated generator value, but the entire 64-bit value is incremented by the engine even when returning a 32-bit value to a client which uses dialect 1.)

PROGRAMMING SUPPORT

The ISC_INT64 type in the supplied include file ibase.h. is a 64-bit integer type on platforms which support one. GPRE will generate appropriate code when an Embedded SQL (ESQL) program refers to a column of an exact numeric type which is implemented as a 64-bit integer. The existing types ISC_QUAD, GDS_QUAD, and GDS__QUAD continue to be a structure having explicitly-named high and low parts, as in previous versions. The presence of an ISC_INT64 is signalled to the user by the value of the new manifest constant SQL_INT64 in the sqltype field of the XSQLVAR structure.

INT64 values can be inserted, deleted, updated, selected, and used in "where" criteria via ISQL, DSQL, ESQL, and the API.

UTILITIES

Databases whose tables contain INT64 columns can be backed up and restored, and the precision and scale information survives the backup and restore.

CHANGES TO SYSTEM TABLES

Databases whose tables contain INT64 columns can be backed up and restored, and the precision and scale information survives the backup and restore.

In tables RDB$FIELDS and RDB$FUNCTION_ARGUMENTS, if RDB$FIELD_TYPE is 7, 8, or 16 (SMALLINT, INTEGER, or INT64), then RDB$FIELD_SUB_TYPE has the following possible values:

0 or NULL: RDB$FIELD_TYPE is the defined type
1: The field is defined as NUMERIC
2: The field is defined as DECIMAL

This is in addition to the previously-documented uses of RDB$FIELD_SUB_TYPE with Blob and CHAR columns.

RDB$FIELD_PRECISION, a SMALLINT, is a new column in RDB$FIELDS and RDB$FUNCTION_ARGUMENTS. If the column was defined as NUMERIC or DECIMAL, the RDB$FIELD_PRECISION stores the defined precision. For example, if a column is defined as DECIMAL(13,4), the row in RDB$FIELDS describing that column has the following values: RDB$FIELD_TYPE is 16 (int64), RDB$FIELD_SUB_TYPE is 2 (decimal), RDB$FIELD_PRECISION is 13, and RDB$FIELD_SCALE is -4.

Requirements and Constraints

Initially, this feature will only be workable for platforms which support 64-bit integers as a native type in the C compiler which we use. Fortunately, this includes all three tier-1 platforms and any platform for which we use the Gnu C compiler (including SCO and Linux). It also includes Windows CE, in case we decide to port to that platform in the future. It seems likely that by the time we are ready to port 6.0 to other platforms, those other target platforms will have 64-bit integer support in the vendor's C compiler, or else we will be able to perform the port using GCC. Note that this is only a requirement on the C compiler to be used, and has nothing to do with whether or not 64-bit hardware is present.

If there is a requirement to port V6 to a platform which lacks C compiler support of 64-bit integers, the porting effort will be major, involving implementation of a library of 64-bit integer arithmetic functions and/or macros, and recoding much arithmetic inside the engine in terms of calls to those library. Note that it would be quite as painful for the customers to write ESQL programs on such a platform as it would be for us to port the product to it: however, a 64-bit-integer-capable platform such as Win32 could connect to a server on such a platform and take advantage of the correct implementation of exact numeric types, so such a requirement, although unlikely, is conceivable.

To be more precise, defining columns and domains of the int64 types can be made to work easily, but getting the engine to translate a string of more than 9 digits into an int64, or implement infix arithmetic and the numeric built-in functions correctly as specified herein would be a major undertaking.

Migration Issues

V6 requires a major ODS change. The following table indicates the functionality obtained with various combinations of client release and SQL dialect, server release, and ODS major version.

Client
version
Dialect Server
version
ODS Functionality
5.0 1 5.0 8,9 V5 functionality
5.0 1 5.0 10 Server cannot open the database.
5.0 1 6.0 8,9 Server cannot open the database.
5.0 1 6.0 10 Fields which were defined using dialect 1 or a previous release behave in the old way. Definition of new fields also behaves the old way. Columns defined using the new 64-bit data types cannot be accessed. The least significant 32 bits of a generator are returned as an integer. Computations yield the results and result-types of V5.
6.0 1 all all The same as a V5 client with the same server and ODS
6.0 2 5.0 8,9 Error, server does not understand the dialect. Client library must check for this.
6.0 2 5.0 10 Server cannot open the database.
6.0 2 6.0 8,9 Server cannot open the database.
6.0 2 6.0 10 Data access for columns created with V5 functionality is the same as in V5; for columns created with V6 it is the same as with V6. All defines involving numeric or decimal data types produce V6 functionality, but also generate a warning message, since behavior differs between V5 and V6.
6.0 3 5.0 8,9 Error, server cannot understand the dialect. Client library must check for this.
6.0 3 5.0 10 Server cannot open the database.
6.0 3 6.0 8,9 Server cannot open the database.
6.0 3 6.0 10 Data access for columns created with V5 functionality is the same as in V5; for columns created with V6 it is full new functionality. All defines involving numeric or decimal data use integral types.

UDFs which receive or return NUMERIC or DECIMAL types with precision >=10 must be converted to work with the INT64 representation, rather than with DOUBLE as in V5 and earlier. Alternatively, the function definitions may be changed to specify DOUBLE arguments and/or returned values rather than the present NUMERIC or DECIMAL types.

QLI, an unsupported old tool which is still shipped with the product on platforms which had InterBase V3, cannot handle an INT64 value.

Results of / and AVG

All arithmetic operations in dialect 3 produce the results, and the types of results, called for by the SQL-92 standard, which in some cases are not what customers have learned to expect in previous releases of InterBase. The most serious potential problems for a customer migrating from V5 or earlier to V6 involve the division operator and the AVG function (which also implies division) with operands of exact numeric types.

If at least one operand of a division operator has an approximate numeric type (FLOAT, REAL, or DOUBLE PRECISION) then the result of the division will be DOUBLE PRECISION, and will be exactly the same as that produced in V5 by division on any operand types. However, if both operands of a '/' operator have any exact numeric types, the result will be exact numeric with scale equal to the sum of the scales of the operands.

Here is a brief example:

define table t1
(i1 integer, i2 integer, n1 numeric(16,2), n2 numeric (16,2));
insert into t1 values (1, 3, 1.00, 3.00);
commit;

"select i1/i2 from t1" returns the integer value 0, because each operand had a scale of 0, so the scale of the result is 0, meaning that the result must be an integer.

"select i1/n2 from t1" or "select n1/i2 from t1" returns the numeric(18,2) value 0.33. Since one operand has scale 0 and the other has scale 2, the result has scale 0 + 2=2.

"select n1/n2 from t1" returns the numeric(18,4) value 0.3333. Since each operand has scale 2, the result has scale 2 + 2=4.

In V5, any of the / operations above would have returned the double precision value 0.3333333333333333. If a user wishes to obtain in V6 dialect 3 the results which he would have obtained in V5, he will need to alter his query to cast at least one of the operands into an approximate type

For example:

select i1 / cast(i2 as double precision) from t1;

Similarly, the result of taking an AVG on an exact numeric column has the same scale as the column itself: for example, the average of an INTEGER column would have type NUMERIC(18,0). If the values stored the column in different rows are 1, 1, 3, -3, and 0, then SUM(column_name) will return the integer 2, and AVG(column_name) will return the integer 0 in dialect 3, while a V5 system would have returned the double precision floating point value 0.2000000000000. To obtain from a V6 dialect 3 database the same result which would have been returned by a V5 system, the customer might change his query from

SELECT AVG(column_name) FROM table_name;

to

SELECT AVG( CAST (column_name AS DOUBLE_PRECISION)) FROM table_name ;

DSQL applications which use DESCRIBE

If a DSQL application performs abitrary operations on the database, and depends on the describe mechanism to tell it the types of the columns, it must be prepared for the sqltype to be SQL_INT64, and act accordingly.

Formatting and operating on exact numeric columns

Old C or C++ applications, whether using DSQL or ESQL, may assume that a column intended to hold a numeric value either is already of C type "double" or can be cast into that type and then printed witha %f format item: for example, if the SQL declaration of the column is NUMERIC(13,2), the application might pass the double host variable to printf or sprintf, with an format string containing "% 14.2f". In V6 with dialect 3, the application must be prepared to handle, for example a NUMERIC(13,2) which is stored as a 64-bit integer, or even more so, a field of type NUMERIC(18,2), which is stored as a 64-bit integer, and which cannot in general be cast into a double without loss of information. The application program then needs to be prepared to convert the stored binary value into a printable string without the aid of the %f format mechanism of printf.

The examples shipped with V6 on platforms supporting GPRE shall include a subroutine which will perform the necessary formatting in a manner which preserves all the information contained in the int64 data, as well as inserting the decimal point in the right place when the scale is not 0. The function print_item_numeric in isql/isql.e can provide a starting point for the required example code.

GBAK Conversion

When a V5 backup file is restored by V6 gbak (or via the Win32 GUI tools), the major ODS version is changed, adding the new column to RDB$FIELDS and RDB$FUNCTION_ARGUMENTS, and all generator values are sign-extended from 32 to 64 bits. (This means that we implicitly treat generators as signed data. This is consistant with the definitions inside the engine, which use type SLONG in V5 and SINT64 in V6. It is also consistent with the SQL standard, which considers all numeric data to be signed.)