Exact Numerics - Design Specification

See also: Functional Specification

High-Level Design

This feature crosses component boundaries, affecting JRD, DSQL, ISQL, GPRE, GBAK, and GFIX. It also changes metadata.

High-Level Algorithm

Instead of inferring the declared type and precision from the stored type, we record the declared precision in a new row, rdb$field_precision, of rdb$fields, and store an indication of numeric or decimal declaration in rdb$field_sub_type, which was previously unused when rdb$field_type is an arithmetic type.

Arithmetic items converted from strings are carried internally for computation purposes in type SLONG if they will fit, type SINT64 if they are too large to fit in an LONG, and in type double only if they are too large for a quad or if the string contains an explicit exponent. Evaluation of arithmetic expressions and functions are similarly carried out in SINT64, unless at least one of the operands is of an approximate numeric (that is, floating point) type, in which case the evaluation is carried out in double precision floating point.

Exact numeric data are never converted to a floating point format for printing or as a step in conversion to a char type.

An index key segment representing an SINT64 item consists of a double followed by a short integer, with the same adjustments to the stored value of the double as in the case of a double which represents an item of an arithmetic type other than SINT64, and the short integer stored in big-endian format, to ensure that the concatenation of key segments representing multiple fields can be compared as an array of unsigned bytes, as in versions prior to InterBase 6.0.

Data Structures/Class Hierarchy

SINT64 and UINT64 are arithmetic types. The C-language expression of the underlying type varies from platform to platform, and is defined within the appropriate platform's portion of jrd/common.h. For most platforms, the typedef's for the *INT64 types refer to [unsigned] long long int, but for Win32 they refer to [unsigned] __int64 (note that there are two underscore characters).

The following types, with the obvious meanings, are also added to jrd/common.h, with the intention that the code will migrate towards using these rather than the *SHORT and *LONG types in the future: SINT16, UINT16, SINT32, UINT32. ** Proposed change postponed **

BLR and NOD changes

Because the result types from many operations have changed, we define new blr verbs: that way existing user code won't be confused by getting an unexpected result type from an old verb. The new verbs blr_add2, blr_subtract2, blr_multiply2, blr_divide2, blr_agg_total2, blr_agg_average2, blr_agg_total_distinct2, and blr_agg_average_distinct2 return an int64 in cases where the old verbs returned a double, except that if one or more input arguments are of a floating-point type the result is still a double.

As usual, for each new verb blr_xxxx there is a matching new node type nod_xxxx.

The old verbs blr_add, blr_subtract, blr_multiply, blr_divide, blr_agg_total, blr_agg_total_distinct, blr_agg_average, and blr_agg_average_distinct are deprecated: they remain in InterBase 6.0 to support dialect-1 applications, but dialect-3 applications will use the new verbs.

Interfaces

This feature uses the same interfaces as previous releases did, particularly our descriptor based parameter passing. The main difference is that the particular type passed and returned through the descriptor is often SINT64 where early versions of IB returned DOUBLE or SLONG.

Detailed Design

Detailed Algorithm

Detail changes are spread throughout the affected modules:
see the "Affected Module" section.

The following are not yet reflected in the "modules" section (TBD):

* Give the DSQL parser access to the SQL dialect, and make the parser do the right thing (dialect leveling) when the client wants to define or modify with dialect < 3. Make the DSQL parser emit the new blr verbs for dialect 2 or 3, and the old ones for dialect 1.

* GPRE: for each language we still support (at least C, C++, and Ada), GPRE must emit ISC_INT64 host variable declarations when the type of column being retrieved is exact numeric with precision 10 or greater. It must also do the right thing for SQL expressions, depending on the dialect, emitting the new verbs such as blr_divide2 rather than the old ones such as blr_divide, which are only used for dialect 1. Notably, the result of a expression involving two exact numerics, or the AVG of an exact numeric column, is an ISC_INT64 with the scale stated in the functional spec.

Precision information is NOT carried inside the engine as an additional field of struct dsc, as was originally proposed. Since the declared precision does not affect the results of computations, it is only retrieved from RDB$FIELDS when needed for presentation to the client, as for SHOW or EXTRACT in ISQL.

Creating a key segment for a datum consisting of an SINT64 and a scale:

  1. If the datum is not already within a factor of 10 of the largest value which can be represented in an SINT64, multiply the SINT64 by the largest power of 10 which will NOT cause an overflow, and subtract the exponent of 10 from the scale, so that the combination of the SINT64 and the scale continue to represent the same value. This "pre normalization" step ensures that, for example, the value 10.000 entered while the column had a declaration of DECIMAL(18,3) and the value 10.00 entered while the column had a declaration of DECIMAL(18,2) will have the same representation on the way to becoming a key: that is, values which are abstractly equal will have equal keys, even if the stored forms of the two values differ. This is necessary to ensure that an index can correctly enforce a UNIQUE constraint following a change in the declared scale. This also has the effect that if an SINT64 value gets rewritten into a different scale because the row is being updated and the declared scale of the column has changed, the key segment does not need to be updated. (This preserves a property which was one of the reasons for the decision to use munged doubles as key segments representing arithmetic data in the first place: rescaling the value does not change the key segment.) Call the result of this step the scaled value.
  2. 2. Set the SMALLINT part of the key to ((scaled value) MOD 10000). Set the DOUBLE part of the key to ((scaled value) / 10000) / (10.0 raised to the power (-scale)). Note that the left-hand division step is done in integer arithmetic, effectively discarding the 4 least significant decimal digits of the scaled value, and that the quotient from that division is converted to double precision floating point before division by the power of 10.0.
  3. Mung the double portion of the key segment in the same way that the double key segment representing a SMALLINT, INTEGER, FLOAT, or DOUBLE datum is already munged in versions prior to 6.0. That is, store byte-by-byte in big-endian fashion, with the sign bit inverted, so that comparison of key segments treated as arrays of unsigned bytes gives the correct result.
  4. Append the big-endian representation of the SMALLINT part to the munged double part.

The result of the above steps is a 10-byte key segment which correctly represents the equality or direction of the inequality of scaled SINT64 values in the same way that the present munged double key segment represents the equality or direction of the inequality of scaled SLONG, SSHORT, FLOAT, or DOUBLE values.

Index Key derivation: Examples
Steps 
(as described above)
example 1 example 2 example 3
** Pre step 1 
Input Value-------->>>> 
Input Scale--------->>>>
=123456789012345678 
=-2
=123456789012345677 
=-2
=250 
=-10
** step 1 
Applied factor----->>>> 
Adjusted scale by ->>>> 
Scaled value------->>>> 
Updated scale ------>>>>
=10 
=1 
=1234567890123456780 
=-3
=10 
=1 
=1234567890123456770 
=-3
=10000000000000000 
=16 
=2500000000000000000 
=-26
** step 2 
key.s_part (smallint)---->>> 
key.d_part (double)----->>>
=6780 
=1.234567890123450e+011
=6770 
=1.234567890123450e+011
=0 
=2.500000000000000e-012
** step 3/4 (after big-endian conversion to 10 byte-by-byte) 
Munged key stored as -------double------smallint---
1.234567890123450e+011---6780 
52 58 14 1a 99 be 3c 42 7c 1a 
 
1.234567890123450e+011---6770 
52 58 14 1a 99 be 3c 42 72 1a
2.500000000000000e-012----0 
95 64 79 e1 7f fd 85 3d 00 00

New/Affected modules

Type Indicators

jrd/ibase.h

  • provide definitions for ISC_INT64 for customer use
  • SQLVAR datatype macro constant SQL_INT64 for DSQL API
  • isc_dyn_fld_precision to be used by DYN

jrd/common.h

  • provide definitions of *INT64 types for the primary platforms, and on open comment to prevent compilation on other platforms until the int64 types have been defined for those platforms.

Database Metadata and ODS

jrd/names.h

  • add an entry for RDB$FIELD_PRECISION

jrdr/relations.h

  • add a FIELD macro for RDB$FIELD_PRECISION

jrd/ini.e

  • support additional metadata in the system relations and new generator size.

jrd/met.e

  • MET_relation(), add support ODS 10

jrd/pag.c

  • PAG_header, add support for ODS 10

Compile and Execute

jrd/blr.h

  • Add verbs blr_add2, blr_subtract2, blr_multiply2, blr_divide2, blr_agg_total2, blr_agg_total_distinct2, blr_agg_average2, blr_agg_average_distinct2, numbers 163 through 170 in that order.

jrd/nod.h

  • Add nod_add2, nod_subtract2, nod_multiply2, nod_divide2, nod_agg_total2, nod_agg_total_distinct2, nod_agg_average2, nod_agg_average_distinct2.

jrd/dsc.h

  • Add dtype_int64=19 for INT64 type.
  • Add dsc_num_type_none=0, dsc_num_type_numeric=1, and dsc_num_type_decimal=2, which apply to dsc_sub_type when dsc_dtype is dtype_short, dtype_long, or dtype_int64.

jrd/dsc.c

  • correct the number of characters needed to format an int64 for printing (19 digits+sign+decimal-point)

jrd/cmp.c

  • in CMP_get_desc(), change the descriptor computed for various computation results, adding support for the new node types such as nod_divide2.

jrd/cvt.c

  • in CVT_get_double(), improve accuracy and correct the conversion from quad to double.
  • CVT_get_numeric() returns an int64 instead of a double for input too large to fit in a long, unless there is an explicit exponent or the precision is too great to fit in an int64.
  • in CVT_move() add code to correctly convert between int64 and other types.
  • in CVT_get_long() add support for NATIVE_QUAD
    - add CVT_get_int64()
  • in integer_to_text(), add support for int64

jrd/cvt2.c

  • make the comparisons work correctly between int64s and other types.

jrd/dyn_def.e

  • add support for precision in DYN_define_global_field(), DYN_define_local_field(), DYN_define_parameter(), and DYN_define_sql_field().
  • gen_unique_id() changes to for int64 instead of long result

jrd/dyn_mod.e

  • DYN_modify_global_field, add support for precision.

jrd/evl.c

  • EVL_expr(), add support for the new node types add2, subtract2, multiply2, divide2, average2, total2,
    - binary_value(), add support for the new node types add2, subtract2, multiply2, divide2.
  • eval_statistical() add support for the new node types avereage2, total2return the correct (exact) type for TOTAL and AVERAGE on exact columns, instead of double
  • new function multiply2(), called from binary_value(), handle int64 arguments correctly, emit a int64 product of long factors, and set the correct scale for the result.
  • new function divide2(), called from binary_value(), handle int64 arguments correctly, emit an int64, rather than a double, quotient of exact arguments.
  • negate(), add an int64 case

jrd/par.c

  • PAR_desc(), add cases for numeric and decimal, with alignment depending on the precision
  • PAR_literal(), in the switch on desc.dsc_dtype add a case for dtype_int64, also for numeric and decimal with handling of precision.

UDF's and Arrays

jrd/fun.e

  • FUN_evaluate(), add support for a UDF returning quad, decimal, or numeric result

jrd/sdl.c

  • compile() and get_range() may need an additional case for int64. (Decision: we do NOT need to support arrays bigger than 2 billion elements at this time, nor do we need to support int64 as a subscript type.)

Indexing Keys

jrd/btr.h

  • Introduce idx_numeric2=8 definition. This will be used to indicate that the index key type is for an INT64 value. This is what will be stored as a 10-byte key.

jrd/btr.c

  • New function make_int64_key() to calculate a key for an INT64 value. Please read description of how this key is arrived at.
  • Changes to BTR_key_length() and compress() routines to make use of make_int64_key(), and subsequent handling of the derived key.

jrd/dfw.e

  • DFW_assign_index_type() now returns idx_numeric2 for dtype_int64

jrd/opt.c

  • sort_dtypes[dtype_int64]=SKD_int64.
    sort_dtypes[] is used in gen_aggregate() and gen_sort() functions to return an RSB for a sort/project operation.

jrd/sort.h

  • Make a new definition SKD_int64=17 for the Sort Key Definition (SKD) block.

jrd/sort.c

  • diddle_key() routines get additional cases for INT64 (SKD_int64)

DSQL

Parsing

dsql/parse.y

  • change handling of numeric and decimal declarations to set the subtype and precision.
  • emit the new nod types (nod_add2 etc) for dialect 2 or 3, the existing ones for dialect 1.

dialect leveling

dsql/gen.c

  • emit the new blr verbs (blr_add2 etc) for the new nod types (nod_add2 etc) from parse.y.

SQL_INT64 datatype handling for SQLVAR

dsql/sqlda.h

  • SQLVAR datatype macro constant SQL_INT64 for DSQL API (same as in ibase.h)

dsql/dsql.c

  • Conversion of InterBase Exact Numeric types to SQL SQLVAR datatype SQL_INT64.
  • Provide information on a DSQL statement, var_info()

dsql/utld.c

  • - handle SQL_INT64 case in utld_parse_sqlda(), sqlvar_to_xsqlvar(), xsqlvar_to_sqlvar().

dsql/array.e

  • set array_desc_dtype for INT64

isql/isql.e

  • add_row(), print_item(), process_statement(), sqltype_to_string().  Handle SQL_INT64
  • examples: examples{4,5}, Incorporate SQL_INT64 handling.

XDR handling for INT64 type (Remote)

remote/protocol.c

  • Rename current occurences of xdr_hyper() with xdr_quad(). xdr_hyper() (according to SUN) is meant to translate between "long long int" and their external representation. In our code it is improperly used to translate a QUAD structure.
  • Create new function xdr_hyper() to handle 64-bit integers. This could be done as an array 2 "long". This would originally belong in REMOTE/xdr.c, but has now been defined here for reasons mentioned below. Enable this for all platforms except Solaris (since it is available in the XDR library on this platform).
  • xdr_hyper() should handle "swapping" of the 2 long's to be "Endian" sensitive.
  • Handle dtype_int64 in xdr_datum() by calling xdr_hyper().

remote/xdr.c

  • There would have been a definition of xdr_hyper() as defined by the XDR protocol (SUN Microsystems) here. Since the function is also not available on other system-provided XDR libraries (not available on HP-UX at least), I have decided to put the new xdr_hyper() in protocol.c. Also, this module is not included in our shared libraries on platforms where XDR library is available.

ISQL

isql/isql.e

  • create new function format_quad()
  • print_item(), use format_quad() to handle non-zero scale on exact numeric types without potentially information losing conversions to double. (Version 5 cast an exact-numeric argument into a double, multiplied or divided by 10.0 an appropriate number of times to accommodate the scale, and then used sprintf() to format the output with the right number of digits, producing incorrect least significant digits on some platforms.)
  • See SQL_INT64 datatype macro definition for changes w.r.t SQLVAR in ISQL client.

isql/show.e

  • use the sub_type and precision information in the descriptor returned by DSQL to correctly distinguish among NUMERIC, DECIMAL, and the integral types in show_domain(), show_proc(), and show_table().

isql/extract.e

  • - use the sub_type and precision information in the descriptor returned by DSQL to correctly distinguish among NUMERIC, DECIMAL, and the integral types while extracting metadata for DOMAINs and fields
     

GPRE

gpre/* : TBD

Testing Considerations

Tests should ensure that we correctly return computed columns up to the most-positive and most-negative values which can be stored in an INT64, and also that we correctly return arithmetic overflow error if a computed result involving only exact-numeric operands is too large to fit into an INT64.