A Comparison of SQL Server and InterBase Datatypes and Some Other Issues

by Claudio Valderrama C.

In SQL Server, Transact-SQL has the following base data types. I will place on the right side of the equals sign the InterBase equivalent.

+binary(n)=char(n) character set octets

It has been reported that the InterBase client library has a bug regarding octects, but I have yet to see a cdefinitive answer.

+bit=

This is an heritage from Sybase. The internal implementation uses a byte or a word. This is a boolean field. People have used char and smallint in InterBase for this. When I worked with the Sybase engine, even the Sybase client-side tools on Sun platforms (DB-Workbench) had problems dealing with this data type, so I didn't use it.

+char(n)=char(n)

+cursor=

Only available through trickery in procedures. Below is an example, the basic steps are:

FOR select field[s]
  from tbl
  FOR UPDATE
  INTO :variable[s]
  AS CURSOR mycursor
DO BEGIN
  {update tbl | delete from tbl}
  where CURRENT OF mycursor;
END

+datetime=date

Available in InterBase V5 and also InterBase V6 dialect 1;

Also timestamp in InterBase V6, including dialect 1 (yes, both date and timestamp are synonyms in IB6-dialect 1).

BTW, I don't know why "timestamp" was used, it's misleading, I would have preferred datetime, maybe it's the SQL standard again... "timestamp" carries the meaning of an automatic mark of 'NOW' or current_timestamp and this is not true, this field only keeps both date and time but it doesn't get filled automatically. The "datetime" definition in SqlServer does seem better to me. In SqlServer, the type uses 4 bytes and records date and time data from January 1, 1753, to December 31, 9999, with an accuracy of three-hundredths of a second, or 3.33 milliseconds. The base year is 1900. According to the manuals, InterBase can record from 1 Jan 100 a.d. to 29 Feb 32768 a.d. (the base year is 1858) with a precision of 1/10 millisecond. The timestamp (old "date") type in InterBase is 8 bytes in length, so theres no surprise that it blows SQL Server away for both range and precision.

There does however seem to be a bug in the InterBase documentation... some time ago, when going backwards in time, I reached year 1 whilst the InterBase LangRef in pages 19 and 20 says that the lower limit is year 100. To prove the error, I insert into a table the value ('1-Jan-0001') where I had defined a timestamp field. So, the real range for InterBase is actually 1-Jan-0001 A.D. to 29 Feb 32768 A.D. Note also however, that InterBase V6 will only accept up to year 9999 when passed as a string, so the limit would actually be a value of ('31-Dec-9999'). But if you use the InterBase API, 1,469,902 and its negative counterpart are actually the full range.

+decimal(p,s)=decimal(p,s)

+double precision, same as float(53)=double precision. See notes

+float(n)=double precision or float, depending on "n"

Float has no configurable precision in InterBase and is not directly equivalent to this type. See notes.

+image=BLOB sub_type N

There are some proposed defined sub_types for specific blob fields in the InterBase header file, but you don't have to worry about them, sub_type is defined only for our usage, the database engine doesn't care (unless of course you are using Blob Filters), you can use the default sub_type of zero or a negative sub_type that's you defined; it doesn't effect the storage in anyway.

+int=int, integer

+money=numeric(18,4)

InterBase doesn't allow a precision of 19, whilst Sql Server actually specifies (19,4) for this field. But internally InterBase does allow for numbers using 19 digits of precision. The reason why you can't specify 19 "officially" is that 1E18 < 2^63-1 < 9.9999999999999E18 is really 2^63-1=~ 9,223372036854775807e+18. It seems that InterBase R&D wanted to be honest, because you can't use the small margin from 9.223 to 9.999 if you were allowed to define (19,X) types... incidentally, you may assess for yourself the honesty of the SQL Server team. See notes at the bottom.

+nchar(n)=nchar(n)

In the InterBase Language Reference, page 19, where datatypes are explained, NCHAR doesn't appear. In fact it's not explained anywhere in the Language Reference, although it's listed three or four times. It would be nice to know exactly what it is. In my experience, it always maps to ISO8859_1 whose rdb$bytes_per_character is 1. It supports a COLLATE specification according to the collations available in InterBase for ISO8859_1. NCHAR in SqlServer is always UNICODE. The low level equivalence is actually in InterBase: in that you can set the char character set to UNICODE. But in InterBase, UNICODE has no collation sequence other than the default provided.

+ntext=blob sub_type text character set UNICODE

(Due to the previous point, it might be charset ISO8859_1 instead, depending on what you really want. Remember that InterBase doesn't allow COLLATE on BLOBs.)

+numeric(p,s)=numeric(p,s)

+nvarchar(n)=nchar varying(n)

But the low level equivalence (see the two points above) is really: varchar character set UNICODE.

+real, same as float(24)=float. See notes

+smalldatetime=date in InterBase V5, timestamp in InterBase V6

See discussion for datetime above, InterBase doesn't provide an equivalent with less precision, it uses full precision date and time fields. In Sql Server, smalldatetime uses 2 bytes and tracks date and time data from January 1, 1900, through June 6, 2079, with accuracy to the minute. This may have been created to reduce storage requirements in SQL Server.

+smallint=smallint

+smallmoney=numeric(10,4)

It will have a slightly bigger range. Money and smallmoney are types that are not predefined in InterBase. See notes below.

+text=memo, namely, blob sub_type text

"Text" is a reserved word only in the context of a blob sub_type declaration. The allowed sub_types are TEXT, BLR, ACL, RANGES, SUMMARY, FORMAT, TRANSACTION_DESCRIPTION, EXTERNAL_FILE_DESCRIPTION and they don't interfere with declarations of fields with the same name. For example, I tried with success the following sequence:

create table sql(sql int);
commit;
insert into sql(sql) values(1);
commit;
/* Notice "text" appearing two times. */
alter table sql add text blob sub_type text;
commit;

+timestamp=No equivalence in InterBase

Don't be fooled, read the MicroSoft SQL Server help file first:

"The SQL Server timestamp data type has nothing to do with times or dates. SQL Server timestamps are binary numbers that indicate the relative sequence in which data modifications took place in a database. The timestamp data type was originally implemented to support the SQL Server recovery algorithms. [snip] In SQL Server version 7.0, @@DBTS is only incremented for use in timestamp columns. If a table contains a timestamp column, every time a row is modified by an INSERT, UPDATE, or DELETE statement, the timestamp value in the row is set to the current @@DBTS value, and then @@DBTS is incremented by one. [snip] Never use timestamp columns in keys, especially primary keys, because the timestamp value changes every time the row is modified."

In conclusion, it's only for internal use and SQL Server 7 doesn't need it anymore for its recovery algorithms. You can't compare it with an InterBase generator, unless you want to write triggers. Perhaps it could be compared with an InterBase record version plus a transaction identifier. Also, it's legacy datatype, so let's just forget about it. A SQL Server timestamp is defined briefly as database-wide unique number. In this sense, a logical equivalent may be the InterBase's rdb$db_key, because it's unique in a database, too. However, their objective and origin are radically distinct.

+tinyint=No equivalence in InterBase

The char type can't be used in InterBase to operate numerically as in C/C++, so use smallint.

+varbinary(n)=varchar(n) character set octets

+varchar(n)=varchar(n)

+uniqueidentifier=char(32) [to keep a visible representation] or char(16) character set octets [to keep a binary representation]

This is the field to keep GUIDs. Transact-SQL provides the NEWID function that creates a new GUID value... a sort of random version of IB's gen_id. generator function. it's clear the objective is not the same in both databases.

Notes:

  1. This explanation from SQL Server help file seems really bizarre:

    " Currency or monetary data do not need to be enclosed in single quotation marks. However, the monetary data value must be preceded by the appropriate currency symbol. For example, to specify 100 English pounds, use £100."

    So, SQL Server ties not only a data type to a currency, but it also ties the monetary sign to that value, is this proposed in the SQL standard?

  2. In SQL Server precision is enforced. I mean that with numeric(10,1) you can insert up to 9 digits in the integer part and one digit in the fractional part. If you attempt to insert more than 9 digits in the integer part, you'll get an exception. Compare that with InterBase where you can happily insert up to 18 digits in the integer part, without complaint. What's the SQL standard, please?

  3. Maximum length for character fields in SQL Server is near to 8000 bytes in SQL Server 7 and only 255 bytes in SQL Server 6. In InterBase, the limit is 32767 bytes. However, indexes in InterBase are up to 252 bytes only; in SQL Server, they allow up to 900 bytes. Also, long varchar fields can be a problem with InterBase, better avoid them, and use BLOBs instead. I don't have experience measuring performance problems with 8KB varchars in SQL Server. SQL Server 7 is the first version that supports UNICODE. InterBase has a long history of dealing with charsets. UNICODE in SQL Server uses two bytes per character, the same as C/C++. For unknown reasons to me, UNICODE in InterBase requires three bytes per character. The only collation for UNICODE_FSS in InterBase currently is the default collation. Remember that in both engines, the size that matters for indexes is the physical one, that's the logical char/varchar's length multiplied by the size of the character set used, in bytes per character. However in InterBase, ISO8859_1 can be up to 3 bytes if a COLLATE clause other than the default is included.

  4. High level operations on date and time. Below the SQL Server call is presented and then the InterBase alternative is shown. Unfortunately, either you create a procedure or rely on a UDF to mimic SQL Servers behavior. For the SQL Server functions that follow, "datepart" can be the reserved words millisecond, second, minute, hour, day, month, year, dayofweek, dayofyear and quarter. InterBase doesn't have "millisecond", SQL Servers dayofyear is yearday, SQL Servers dayofweek is weekday and SQL Servers quarter is missing from InterBase.

+ getDate() =cast('now' as date) in InterBase V5 and is current_timestamp in InterBase V6 in any dialect.

+ DATEADD(datepart, number, date) adds the "number" of "dateparts" to date (that's really a datetime).=In InterBase, no counterpart exists.

For timestamp, adding an integer to a date changes the date by the equivalent amount of days. To deal with the time part, one must know that the value passed is fractional, where adding N seconds means adding N/86400.0, adding N minutes means adding N/3600.0 and adding N hours mean adding N/60.0. Note, that due to the change in arithmetic rules in dialect 3, one of the operands must be floating point or the result would be zero, because in dialect 3, the division of two integers follow C rules, that produces an integer result and as such you would be adding zero time, that's the reason I used the divisors (see in previous lines) such as 3600.0, for example. There's no built-in way to add months, years or quarters in a single operation in InterBase. See the FreeUDFLib for some UDFs that deal with timestamp types in InterBase. Please note that this library doesn't know how to deal with the new DATE-only and TIME-only in dialect 3. In these cases, unlike timestamp where you need to use the integer and fractional parts depending on your needs, DATE-only is incremented or decremented in days by adding an integer and TIME-only is incremented or decremented in seconds by adding an integer.

+ DATEDIFF(datepart, startdate, enddate) returns the number of date and time boundaries crossed between two specified dates.

In other words, it counts how many "dateparts" are between two datetimes.=In InterBase V5, the difference of two datetimes produces a double precision value (the integer part is the number of days and the fractional part is the fraction of a day). In InterBase V6, the difference between datetimes produces a numeric(18,9) value (again, the fractional part is the fraction of a day), the difference between two dates produces an integer number of days and the difference between two times produces a numeric(9,4) value (where the integer part is the number of seconds and the fractional part is the fraction of seconds up to a 1/10 milliseconds of precision). There's no way to return those differences in other units than the ones specified. For example, if you want the difference between two dates in months or years and not days, you'll need a UDF. Again, see the FreeUDFLib for some functions.

+ DATENAME(datepart, date) returns a character string representing the specified datepart of the specified date.=No direct counterpart in InterBase.

Bbut you can use something like cast(extract(datepart from datevalue) as varchar(21)) to handle all cases. Remember that there're some dateparts not available in InterBase and also, due the DATE-only and TIME-only data types in dialect 3, some operations will be invalid in InterBase, for example, requesting the year of a TIME-only expression.

+ DATEPART(datepart, date) returns an integer representing the specified datepart of the specified date.=This is simply extract(datepart from datevalue) with the limitation that some "dateparts" have another name in InterBase or don't exist.

The predefined system variables for date and time in InterBase are described in my site, at the URL http://www.cvalde.com/document/DateTimeFunctions.htm for people that want to know all the reserved words that give the current date and time.

  1. We already have seen that char/varchar Support in SQL Server 7 is almost up to 8Kb wheras previously they were only 255 bytes. InterBase supports up to 23K, but there are performance issue. As such there should be a way to deal with long varchars. In SQL Server, ntext contains unicode data, so the maximum is 2^30-1 characters, because each char is two bytes. In contrast, text can have up to 2^31-1 characters, because each character is one byte, but depending in the character string, the number of allowed characters may be less. I understand this has to do with charsets, too. Image is raw storage up to 2^31-1. I will refer to these three field types as "TNI". In SQL Server, CAST can operate on text and ntext but it can't operate with user-defined data types. One nicety is the CONVERT function (non ANSI SQL) to format date and time as a string. The client receives the field as a string and not as date. I've used this feature in ASP applications, where I needed to show dates in the Universal Format (year first), independent of the client's regional settings. Also, substring() can be applied directly to TNI fields. There's a statement called WRITETEXT that can replace TNI fields' contents in a non-logged way by default: (in that it doesn't go to the transaction log). To change only a portion of a TNI column, you can use UPDATETEXT, where an offset is optional to insert data in the middle of the stream or to delete a portion of the stream. A text pointer value is needed for these functions, that's obtained by calling the TEXTPTR function applied to the desired field. It returns a binary(16). UPDATETEXT has an option to place the operation on the transaction log. ADO applications can use the AppendChunk method to specify long amounts of TNI data. ODBC applications can use the data-at-execution form of SQLPutData to write new TNI values. OLE DB applications can use the ISequentialStream interface to write new TNI values. DB-Library applications can use the dbwritetext function (This is also used by the BDE but only when it does the right assignment). Also, these fields support DATALENGTH and READTEXT functions.

    Let's now look at InterBase side: each BLOB can handle up to 2GB in data, the same as SQL Server. A BLOB can have a sub_type and a charset, so this is the way to distinguish between text/ntext and raw binary BLOBs. The charset is enforced; the sub_type is only a signal for the human reader and for applications that rely on it as a convention (Blob Filters for example). There's no CONVERT function in InterBase. CAST cannot deal with BLOBs, you need to use Greg Deatz's FreeUDFLib. BLR has substr() function, but it doesn't work with UNICODE (according to Dave Schnepper) and probably it doesn't work with BLOBs, too. There is the horrible substr() function implemented in the default ib_udf library for char/varchar and there are some functions to deal with BLOBs in FreeUDFLib (f_BlobLeft, f_BlobRight, f_BlobSize, etc.) to cope with lack of native support. f_BlobAsPChar is the de-facto cast from BLOBs to char/varchar fields and f_StrBlob is the opposite, but has a 254 characters limit. For some reason, f_BlobAsPchar causes problems with my InterBase V6, so if you experience the same problem, use f_BlobLeft(blob,length) instead. Please note that UDFs don't know anything about charsets or collations. There's no "writetext" nor "updatetext" equivalent in stored procedures. Maybe if someone explains once for all how to leverage the power of segmented v/s stream BLOBs, we can advance. Apart from declaring an array, I don't know how to declare directly a stream BLOB in a procedure. There's no instruction to read/write segments or do insertions in the middle of BLOBs from stored procedures. Furthermore, officially, BLOB parameters in stored procedures are not supported even though I've been using them since InterBase V4.2. It's possible even to define a BLOB variable in the stored procedure, but it probably stores the BLOB_ID only. Note, because IB uses a Multi Generational Architectue, a BLOB operation is ALWAYS "logged", by this I mean the old version is stored and the new version, too, so updating a 100 MB BLOB can be a real challenge for the engine.... Is only the changed segment stored in the new version or is all of the BLOB is repeated? In the last case, updating large BLOBs becomes unpractical as the database could grow very quickly. Each BLOB has a BLOB_ID that's the information you see on a normal select SQL, but you can't manipulate a BLOB through a handle with a function like SQL Servers TEXTPTR (see above) from a stored procedure, you must use a client-side program or embedded SQL. I still think BLR may have some hidden surprises, but we can't surface them from the current PSQL (stored procedure and trigger language). One of the nice things about SQL Server is that you're currently allowed to insert literal data in TNI fields without creating a parameterized SQL statement, fill in the parameters and then submit. It's a current limitation with InterBase. If you want to use UDFs, then there's a workaround in InterBase without parameters and it's:

    insert into tbl(blobfield)
    values(f_StrBlob('this is my literal string'))
    

    so you can verify that it works if you really are using an script and can't use parameters. The only pre-requisite is that you must ensure that this UDF is available and defined in your target database. In InterBase, BLOBs have charsets but not collations, so even if UPPER were allowed to work on them, it wouldn't produce the desired effects with characters other than the ones in the ['a','z'] range in the ASCII table. (For example, accented characters won't be uppercased.) My only workaround is to use one of the UDF calls highlighted above to get a string from a BLOB and apply the correct charset and collation to that string to have it uppercased properly. Since SQL Server doesn't allow charset/collations field-wise, there's not much to deal with, the setting is global, and it produces acceptable results in my experience. More below in point 9). In SQL Server, LOWER is built-in, and it works as expected in all cases. Since LOWER() is a UDF in InterBase, it doesn't operate correctly on accented or special characters, for example, those with umlaut.

  2. The types decimal and numeric are exact synonyms in SQL Server. In InterBase, numeric(p,s) and decimal(p,s) map to the same storage implementation (depending on the precision and dialect, it can be smallint, int, double precision in dialect 1 or ISC_INT64 in dialect 3), but the system field rdb$field_sub_type is 1 for numeric and 2 for decimal. BTW, INT64 is not defined as value 16 in rdb$types, for people that want to extract metadata directly from system tables. The maximum limit for numeric/decimal storage in InterBase V6 is 8 bytes for a precision=18, whereas the default limit in SQL Server 7 is 13 bytes (so you can have up to precision=28) and if the "-p" startup parameter is specified, SQL Server 7 can go up to 38 bytes of storage, so you can have precision=38 as a maximum. As such, becareful, because values kept in SQL Server could overflow InterBases current capabilities.

  3. SQL Server still lacks true DATE and TIME types. In InterBase V6 dialect 3, they are named exactly DATE and TIME and earlier DATE datatypes become TIMESTAMP (datetime for me). So, SQL Servers DATE is TIMESTAMP in InterBase V6 and is DATE in InterBase V5 and earlier. Delphi understands this field type as "datetime". Notice that InterBase V6's true DATE and TIME in dialect 3 are available only to dialect 3 capable clients, that's clients using the raw IB-API, IBX, IBO. ODBC drivers by Merant and Visigenic along with the BDE DON'T understand dialect 3, they can connect to a dialect 3 database as a dialect 1 client but they cannot use DATE, TIME and fields defined as large exact numerics or ISC_INT64 (fields defined in dialect 3 as numeric(x,y) or decimal(x,y) where x>9). But they can access any numeric/decimal field of a range allowed in previous versions (up to precision 15) provided that such field was defined by a dialect 1 client, in whose case the internal storage is double precision as in InterBase V5 and earlier versions.

  4. There's a large mess of float / double precision data types available in both engines, so be careful. In SQL Server, float accepts an optional notation as float(n) to specify the number of bits used to store the mantissa of the float number in scientific notation, thus "n" becomes the precision. If n is 53, that's SQL Servers "double precision" (same as the "double precision" in InterBasee) and if n is 24, that's SQL Servers "real" ("float" in InterBase).

  5. More on charsets and collations as promised in 4.

    SQL Server:

    "Code Pages and Sort Orders

    The physical storage of character strings in Microsoft® SQL ServerT is controlled by the code page and sort order selected during installation."

    InterBase:

    There's no default code page at the time of an InterBase installation that can be setup by the user. The default is always NONE, meaning the engine doesn't attempt any transliteration. The charset can be specified at database creation; otherwise, all fields should specify the desired charset. A field's charset can override the database's charset. Fortunately in InterBase, collation can be set on a column by column basis.

    SQL Server:

    "While the code page and sort order control the format of how data is stored in each database, the specification of the code page and sort order are global to the server. Every database attached to a specific SQL Server installation uses the same code page and sort order. The code page and sort order are specified during SQL Server Setup."

    InterBase:

    So, you can argue that InterBase gives you more flexibility but I personally, would like a default that can be applied to new database automatically or at least, a global COLLATE kept in rdb$database.

    SQL Server:

    "There are several code pages. The default code page for a server running SQL Server is the 1252 - ISO character set, which contains the common characters from most languages that originated in Western Europe. [snip] When you pick a code page during the setup of SQL Server, the data in all character columns in the database, and in all character variables and parameters in Transact-SQL statements, is stored and interpreted using the bit patterns from the indicated code page. [snip] If the server is running one code page and clients are using another code page, then the clients may need to turn on options in the SQL Server ODBC driver and OLE DB provider to support the proper conversion of extended characters. In SQL Server version 7.0, the SQL Server ODBC driver, and the OLE DB Provider for SQL Server handle this conversion automatically. [snip] A new feature in SQL Server 7.0 is support for Unicode data types, which eliminates the problem in converting characters. Unicode stores character data using two bytes for each character rather than one byte."

    InterBase:

    What is called here ISO1252 should be no other than WIN1252 for InterBase. I've tried it and it seems to work well, provided that you connect with the same charset as the database. Since InterBase is charset/collate per field, you can see that several fields in system tables ALWAYS have their own charset and collation regardless of the db-wide charset. For example, several fields use UNICODE_FSS whereas rdb$message -the text of an user-defined exception- is charset NONE. One thing that I don't have clear is the level of translation the client side of InterBase can achieve. It seems to be non-existent. If you read the phrase above, both the ODBC and and OLE/DB (ADO) drivers are able to perform translations to the client charset. In InterBase, connecting to a database with a charset other than the one used in the user tables' fields is a direct call to receive tons of "transliteration error" messages.

    SQL Server:

    "It is easier to manage international databases if you: Install SQL Server with the default code page. Implement all char, varchar, and text columns using their Unicode equivalents; nchar, nvarchar, and ntext. The fact that Unicode data needs twice as much storage space is offset by eliminating the need to convert extended characters between code pages. [snip] SQL Server 7.0 stores all textual system catalog data in columns having Unicode data types. The names of database objects such as tables, views, and stored procedures are stored in Unicode columns. This allows applications to be developed using only Unicode, which avoids all issues with code page conversions. "

    InterBase:

    SQL Server has always had problems with these issues. Why didn't you notice any? Probably because WIN1252 is fine for almost all of us, but it doesn't cover 100% the needs of languages in Central Europe, for example as I understand it, in such a case, the translation must be done in the ODBC/ADO driver. This slows down the retrieval process and also, forces you to use a multitude of settings. As I wrote, you are tied to those settings when you install SQL Server and if you want another charset and collation, fine, "just reinstall SQL Server", a problem that InterBase developers never have to face. Regarding InterBase, the names of database objects are kept in UNICODE fields, too. SQL Server found the solution by asking developers to use UNICODE at the cost of wasting always two bytes per character. On the IB side, UNICODE has only one collation and it doesn't give the correct work for my needs, so the one to use is ISO8859_1. That's the reason Why I suggested that in the mapping of fields SQL Servers NCHAR should be mapped to InterBase's NCHAR, because in InterBase's NCHAR, ISO8859_1 it does the right thing if you add the correct collate.

    SQL Server:

    "Sort Order The sort order is another option specified during the Setup program. The sort order specifies the rules used by SQL Server to collate, compare, and present character data. It also specifies whether SQL Server is case-sensitive."

    InterBase:

    SQL Server is clearer here. Collate doesn't tell too much to a newcomer, but SORT ORDER does give a hint at what's being offered. The same function is done by collations in InterBase: they control, inside a given charset, the sort order, the comparison rules and the transliteration issues. Therefore, SORT BY, WHERE, GROUP BY, HAVING and UPPER are affected. In SQL Server, case-sensitive/insensitive is an option. In InterBase, only case sensitive is supported, so people migrating from Access and SQL Server will be shocked. However - wait, this is the past. The future is brighter: Dave Schnepper, the former ISC expert in charsets and collation, is committed to maintaining these drivers and we now have a beta set of INSENSITIVE charsets and collations for InterBase. Contrary to SQL Server, in InterBase the collation is always on a PER-FIELD setting.

    SQL Server:

    "SQL Server 7.0 uses two sort orders, one for the character code page and one for Unicode data. Both sort orders are specified during setup."

    InterBase:

    No surprise here. InterBase uses a set of collations for each charset it supports. And of course, even though it's more work to define the fields, there's no need to reinstall the server to change those settings.

    SQL Server:

    The sort orders are different for each SQL Server code page. For example, the sort orders available with code page 1252 are:

    Dictionary order, case-insensitive (default) Binary order Dictionary order, case-sensitive Dictionary order, case-insensitive, uppercase preference Dictionary order, case-insensitive, accent-insensitive Danish/Norwegian dictionary order, case-insensitive, uppercase preference Icelandic dictionary order, case-insensitive, uppercase preference Swedish/Finnish (standard) dictionary order, case-insensitive, uppercase preference Swedish/Finnish (phonetic) dictionary order, case-insensitive, uppercase preference.

    InterBase:

    Compared to InterBase, WIN1252 is charset 53, uses 1 byte minimum and 1 byte maximum, the default collation is called WIN1252 and the other collations are: PXW_INTL, PXW_INTL850, PXW_NORDAN4, PXW_SPAN, PXW_SWEDFIN, The PXW_ prefix means that they follow the rules of the Paradox for Windows drivers. Of course, the self-explanatory denominations in SQL Server make the selection easier.

    SQL Server:

    "In SQL Server Version 7.0, individual text, ntext, and image pages are not limited to holding data for only one occurrence of a text, ntext, or image column. A text, ntext, or image page can hold data from multiple rows; the page can even have a mix of text, ntext, and image data. [...] Because the blocks of text, ntext, or image data and the root structures can all share space on the same text, ntext, or image pages, SQL Server 7.0 uses less space with small amounts of text, ntext, or image data than earlier versions of SQL Server. For example, if you insert 20 rows that each have 200 bytes of data in a text column, the data and all the root structures can all fit on the same 8 KB page."

    InterBase:

    Very neat, but this appeared only on SQL Server 7 whereas InterBase has had this from day one. This means that this optimization only appears in SQL Server 7 somet 14 years after being implemented in InterBase. Previous versions of SQL Server used one separated page for text, ntext or image data and allocated a complete page even for a TEXT field of 1 byte.

  6. More on charsets and collations. Remember that InterBase needs the right charset to store strings or it may throw a transliteration error. The charset can be specified for the whole database at creation time. A field without a charset specification is created with the default one for the whole database. To discover the default for a database, you can inspect the system table rdb$database that ALWAYS has one record only: if the field rdb$character_set_name is NULL, the charset is NONE, the default and all field's will get this charset by default unless you overwrite it on a field by field basis. You must connect with the same charset that the database uses to be able to work without transliteration errors. Even if you don't see an error message, you may still get the wrong characters, as this example shows in InterBase V6:

    select cast((_WIN1252 'paó') as
      char(10) character set unicode_fss) from rdb$database;
    

    As you can test for yourself, the accented "o" is converted to another character when displayed and no error is returned. If you want to see a typical transliteration error, you can try:

    select cast((_WIN1252 'paó') as
      char(10) character set win1251) from rdb$database;
    

    That's can be painful in some cases, unless your program connects, retrieves the default charset and then reconnect's using it. However, since each field at creation time can overwrite the database default, the developer has this freedom at the cost of having one field that specifies a charset that conflicts with the one for the whole database, so a client using the db's charset as a reference could find problems. In comparison, in SQL Server, the client driver can do the translation to the right charset when returning results to the calling application. However, while InterBase can use database-wide and field-specific charsets, SQL Server uses a server-wide charset and changing it requires reinstalling the server and rebuilding the databases. By default, InterBase uses charset NONE, meaning characters are stored as they come and returned without any transformation. Although I know that I should use WIN1252 or ISO8859_1 (Latin_1) to deal with my native language (Spanish), I usually simply use NONE . One problem is that some characters go in the wrong position when sorting, but in practice, only a few times such characters (mainly ~+n=ñ) go into the first place, so the disorder is not so severe. In InterBase (and probably in every dtabase engine), every charset carries a default collation. There are some charsets in InterBase that have only one collation. All of the native collations are case sensitive, meaning the uppercase alphabet may appear before all the lowercase alphabet, for example. To know what's going with the cryptic collation name, I asked Dave Schnepper and this was the answer:

    "Oh, one more thing Claudio mentioned was the obscurity of names. InterBase follows an old (and seemingly unpopular) method of naming locales. XX_YY where YY is a country code (defined by an ISO spec I can't remember) and XX is a language code (different ISO spec). (This is all documented in the Collation Driver Construction Kit). Thus, FR_FR is French used in France, and FR_CA is French used in Canada. GB_GB must be something like Chinese in the People's republic."

  7. Other Subtleties and mysteries of InterBase

    • There's no rdb$field_sub_type for numeric/decimal defined in rdb$types. They are only documented in LangRef as 1 and 2, respectively.
    • Also, type 16, ISC_INT64, the underlying implementation of numeric(p,s) and decimal(p,s) with p>9 doesn't appear in the rdb$types, is this a bug in the system tables??? All other field types are there!

    There are some other internal types:

    • QUAD (for legacy timestamp?).
    • BOOLEAN (yes, BOOLEAN, but reserved only for UDF's return parameter type, what's about making it visible in DSQL? It seems totally unused and maybe not implemented at all).
    • CSTRING (for UDFs).
    • BLOB_ID it's the type of the BLOB identifier.
    • The sub_types of BLOBs that only are reserved words only in the context of a blob field definition but don't clash with any field name: TXT, BLR, ACL, RANGES, SUMMARY, FORMAT, TRANSACTION_DESCRIPTION and EXTERNAL_FILE_DESCRIPTION. QLI can interpret them.
    • EXPRESSION_INDEX, for some unfinished implementation of expressions on things like field1||field2 and other more complicated examples.
  8. Other issues with SQL Server:

    • A field of type integral can be declared as "Identity". There no such type in InterBase. The only way to emulate it is by writing a trigger that increments a generator by means of the atomic call "gen_id()" and this value is passed to an integer field in dialect 1 and to a numeric(18,0) field in dialect 3. An identity field, if used as a PK field, can cause a client application to lose track of the record because it doesn't know what's the next value is that has assigned. In InterBase, the trick is:

      create triger tbl_bef_ins
      for table tbl
      before insert as
      begin
        if (new.ident_field is null) then new.ident_field=gen_id(gen_name, 1);
      end
      

      and then the application can use:

      select gen_id(gen_name, 1) from rdb$database
      

      if the application doesn't provide the value, the trigger will automiatically fill it in. Also, if such identity column is not used in the client and is not the PK field, then it can be excluded from the SELECT statement and the server will provide automatically the value from the generator. The call to gen_id() needs a generator's name previously defined and an integral increment (integer in dialect 1 or up to numeric(18,0) in dialect 3) that can be either negative or positive. I agree that writing 45 triggers an defining 45 generators to mimic 45 identity fields is tiresome, but there's no a shortcut yet in InterBase.

    • In SQL Server 7, it's possible to call triggers directly as I understand. While I don't know what happens in the case that you call a trigger thats not tied to a a table, the arguments in favor of or against this idea are interesting.

    • Due to a change in SQL Server 7, I want someone to confirm or deny the fact that it's no longer possible to hide the source code of the procedures (and triggers, probably, too), because they are kept always in source form and compiled the first time they are invoked. In InterBase, you can wipe out the rdb$procedure_source and rdb$trigger_source fields in tables rdb$procedures and rdb$triggers respectively and the engine will continue working, same as in Sybase. You only have to keep yourself the source, because if you attempt to edit the trigger, you won't be able to retrieve the source from the db, as expected. You can inspect the BLR (Binary Language Representation) generated for procedures and triggers.

    • SQL Server offers two levels of security: internal and OS-based. In the first case, the server needs to define its own users. In the second case, the server maps users from an NT domain. Since MicroSoft almost has ready the SQL Server 2000, I assume it will recognize the new ADS (Active Directory Service) schema that replaces the NT4's domains. In InterBase, the security is on the server. There's no integration with the operating system. Only ADMINISTRATOR and ROOT can login without a user/pw and they are mapped to SYSDBA. However, in isc4.gdb, the centralized security database used in InterBase has a "host_info" table. And I understand it can map UNIX hosts and also, the USERS table has UID and GID columns to map UNIX users and groups, but I never have tried those facilities.

    • SQL Server has some predefined system roles that can be granted to normal users. In InterBase, the only super user is SYSDBA and no other user can perform all the tasks SYSDBA does. Also, SQL Server keeps a centralized database of users (when in native mode) but users should be mapped by the administrator in each database. In InterBase, a user that's recognized by isc4.gdb on a system running InterBase, can attempt connections to all databases that exist on the same machine where the InterBase Server is running. There are no predefined roles nor auxiliary super users in InterBase.

  9. Disclaimers:

    1. If you find that at some point I wrote about BLOBs and their collations, I meant to write about BLOBs and their charsets, because BLOBs can only have charsets, not collations.
    2. Don't bother trying to look for the database aliases in the server. InterBase uses the physical path from the root of the filesystem in the host machine to identify the databases.
    3. Don't try to define referential constraints between databases, because they don't exist in InterBase. Also, if you think that referential constraints and triggers are exactly equivalent for checking consistency in the db, please read first http://www.cvalde.com/document/declaRefIntegVsTrig.htm to understand that some boundary conditions can defeat your checks made in triggers, so declarative integrity is the right way.
    4. Don't go mad creating indices on InterBase; you will only impact performance if you use more indices than the strictly necessary ones. Indexes are mainly to satisfy WHERE conditions against fields with good selectivity (several different values) but not for SORTING.
    5. InterBase automatically creates indexes on PK, Unique and FK declarations, see http://www.cvalde.com/features/f4/AutoCreation.htm to avoid creating duplicate indexes and so reducing performance. Remember, too that FK declarations require exclusive access to the database, namely, only one connection.