Converting data to Firebird by Writing "Native" External Files

22nd September 2006: Henrique Netzka describes a technique he has developed to convert data from another DBMS to Firebird using external files.

Abstract

When we need to convert data between two databases, being Firebird the destination, it's pretty common to use the Firebird's EXTERNAL FILE resource (here referenced as EF), in order to make the whole process quicker and easier. For example, if we wanted to convert a Clients data table from a dBase-format file into a Firebird database, we'd use a EF to intermediate the conversion: a software would be responsible for reading the dBase table and store the data in a plain-text file as fixed-length text fields (CHAR-like); then the file would be linked to a Firebird DB as an EF table; the data finally would be inserted into the real tables, the ones inside the DB and with the straight field types.

Although, those who work with this method know that it's far away from that easy. There's a big issue with this whole process, which would be the type casts and other treatments that the last step would require: you'd have each row (of the EF table) filled with CHAR values, but in your REAL table you'd have INTEGERs, VARCHARs, DATEs, NUMERICs and so on.

This article, then, will introduce you to a new way to do it all. We won't ignore the three steps described above (recording the data to a file; loading into a EF table; importing data to a real table), but we'll skip the hard and boring step of converting the fields from CHAR to other types. We'll write BINARY, native-formatted files (instead of plain-text files) and, by doing it, we'll reduce significantly the time of the conversion process.

The research and this article

For me to figure all this out, I've used the reverse-engineering method. I created a new and clear EF table, inserted one record and studied what it had really stored in the file. Also I've had to use some of the informations provided by the IBPhoenix and FirebirdSQL.org sites.

It's important for you to realize that the code presented here isn't as good as it can get. You shall consider that we're under didactic circumstances, and sometimes it isn't interesting to have a optimized code. Even so, there's a lot of optimizations, and the code as it is presented here works pretty fast.

There's also a couple of fields that isn't covered by this article. Although, feel free to implement it on your own. : )

The way

It's up to you to choose the better (and faster) way of storing a file. Here, the way presented is using the TStringList class, which already implements the SaveToFile procedure.

Nevertheless, all you'll need is to stack the records and store it into a physical file.

Using this file, you'll create your EF table inside a Firebird database, and then simply import the data into your definitive table. Also, because we'll use a StringList, we'll have to create a double-char column in the end of the table definition. It'll be responsible to store the CR and LF characters, added by the TStringList object, granting that the data will fit perfectly inside our EF table.

Understanding it and working with the TIBRecord class

To the conversion method, then, we'll define a new class: the TIBRecord. It's a quite simple class that works just like a stream: it has an internal buffer (but this one is static-sized) and an integer pointer to the actual position of it. As you append stuff to the record, it'll increase the pointer and store the new information.

Here, the record size may be up to 4096 (4k). If you'll need a bigger record (remember: record = row), all you have to do is increase the FLine's size.

TIBRecord = class
private
  FLine: array[1..4096] of Char;
  FPos: Integer;
  function GetLine: String;
protected
  procedure CheckOffset(ANextSize: Integer; AChar: Boolean = False);
public
  procedure AppendInteger(AValor: Integer);
  procedure AppendSmallInt(AValor: SmallInt);
  procedure AppendFloat(AValor: Int64); overload;
  procedure AppendFloat(AValor: Integer); overload;
  procedure AppendDate(AValor: TDate);
  procedure AppendTimeStamp(AValor: TDateTime);
  procedure AppendChar(AValor: String; ASize: Integer);
  procedure AppendVarChar(AValor: String; ASize: Integer);

  procedure Reset;

  constructor Create;

  property Line: String read GetLine;
  property Pos: Integer read FPos write FPos;
end;

The first thing you'll have to do in order to use the class is (obviously) declare a variable of its type and create it in your conversion process (as shown below):

Var
  IBRec: TIBRecord;
  strArquivo: TStringList;
begin
  {...}
  IBRec := TIBRecord.Create;
  strFile := TStringList.Create;
  {...}
end;

Then, to build a record, make use of the AppendXXXX functions. They'll do all the job for you, and all you'll have to do is pass the correct value to them. In the end, to get the record's value, read the Line property; to begin another one, use the Reset procedure.

{...}
while (not EOF) do
begin
   IBRec.Reset;
   IBRec.AppendInteger(dbTabela.FieldByName('Code').AsInteger);
   IBRec.AppendVarchar(dbTabela.FieldByName('Name').AsString, 40);

   strFile.Add(IBRec.Line);

   Next;
 end;

strFile.SaveToFile('\ExternalFileTable.Ext');
{...}

Once you have the file stored, create your EXTERNAL FILE table:

CREATE TABLE
  TableName
EXTERNAL FILE
  '\ExternalFileTable.Ext'
(
  Code INTEGER,
  Name VARCHAR(40),
  DELIMIT CHAR(2)
);

Remember

If you're using the TStringList method, you'll have to keep a CHAR(2) field in the end of the table (as shown above). They'll store the CR and LF chars.

Class implementation

Finally, here we get the implementation of each class' function, and some relevant comments (please, read the comments).

procedure TIBRecord.AppendChar(AValor: String; ASize: Integer);
var
  Tam: Integer;
begin
  CheckOffset(ASize, True);

  Tam := Min(Length(AValor), ASize);

  SetLength(AValor, ASize);
  FillChar(FLine[FPos], ASize, #32);
  CopyMemory(@FLine[FPos], PChar(AValor), Tam);

  Inc(FPos, ASize);
end;

procedure TIBRecord.AppendDate(AValor: TDate);
Var
  dAux: Integer;
  BaseDate: TDate;

  buf: array[1..4] of Byte;
begin
  CheckOffset(4);

  BaseDate := -15018; // = 17/11/1858

  dAux := Trunc(AValor - BaseDate);

  buf[1] := dAux and 255;
  buf[2] := ((dAux AND (255 SHL 8)) SHR 8);
  buf[3] := ((dAux AND (255 SHL 16)) SHR 16);
  buf[4] := ((dAux AND (255 SHL 24)) SHR 24);

  CopyMemory(@FLine[FPos], @buf, 4);
  Inc(FPos, 4);
end;

{ This function appends a NUMERIC field in the record.
  PAY ATTENTION: the value MUST be passed as an INTEGER.
  Consider, for example, a $1,99 price; you'll have to pass
  this value as “199”, and not 1,99. A easy way to get it
  would be using something like: Trunc(1,99 * 100) }

procedure TIBRecord.AppendFloat(AValor: Int64);
var
   buf: array[1..8] of Byte;
begin
  CheckOffset(8);

  buf[1] := AValor and 255;
  buf[2] := ((AValor AND (Int64(255) SHL 8)) SHR 8);
  buf[3] := ((AValor AND (Int64(255) SHL 16)) SHR 16);
  buf[4] := ((AValor AND (Int64(255) SHL 24)) SHR 24);
  buf[5] := ((AValor AND (Int64(255) SHL 32)) SHR 32);
  buf[6] := ((AValor AND (Int64(255) SHL 40)) SHR 40);
  buf[7] := ((AValor AND (Int64(255) SHL 48)) SHR 48);
  buf[8] := ((AValor AND (Int64(255) SHL 56)) SHR 56);

  CopyMemory(@FLine[FPos], @buf, 8);
  Inc(FPos, 8);
end;

procedure TIBRecord.AppFloat(AValor: Integer);
begin
  AppFloat(Int64(AValor));
end;

// Remember: Firebird's INTEGER is 32 bits, just as Delphi's

procedure TIBRecord.AppendInteger(AValor: Integer);
var
  buf : array[1..4] of Byte;
begin
  CheckOffset(4);

  buf[1] := (AValor and 255);
  buf[2] := ((AValor and (255 shl 8)) shr 8);
  buf[3] := ((AValor and (255 shl 16)) shr 16);
  buf[4] := ((AValor and (255 shl 24)) shr 24);

  CopyMemory(@FLine[FPos], @buf, 4);
  Inc(FPos, 4);
end;

// Smallint, also like Delphi's; 16 bits.

procedure TIBRecord.AppendSmallInt(AValor: SmallInt);
var
  buf: array[1..2] of Byte;
begin
  CheckOffset(2);

  buf[1] := (AValor and 255);
  buf[2] := ((AValor and (255 shl 8)) shr 8);

  CopyMemory(@FLine[FPos], @buf, 2);
  Inc(FPos, 2);
end;

// Timestamp fields; the HOUR part is NOT implemented

procedure TIBRecord.AppTimeStamp(AValor: TDateTime);
// base: 1858-11-17 01:24:08.456
Var
   BaseDate: TDateTime;
   dAux: Integer;

   buf: array[1..8] of Byte;
begin
  CheckOffset(8);

  BaseDate := -15018;//EncodeDateTime(1858, 11, 17, 1, 24, 08, 456);

  // date
  dAux := Trunc(AValor - BaseDate);

  buf[1] := dAux and 255;
  buf[2] := ((dAux AND (255 SHL 8)) SHR 8);
  buf[3] := ((dAux AND (255 SHL 16)) SHR 16);
  buf[4] := ((dAux AND (255 SHL 24)) SHR 24);

  ZeroMemory(@buf[5], 4);

  CopyMemory(@FLine[FPos], @buf, 8);
  Inc(FPos, 8);
end;

procedure TIBRecord.AppendVarChar(AValor: String; ASize: Integer);
var
  Tam: Integer;
begin
  Tam := Min(Length(AValor), ASize);

  CheckOffset(ASize + 2);

  SetLength(AValor, ASize);

  FLine[FPos] := Chr(Lo(Tam));
  FLine[FPos+1] := Chr(Hi(Tam));
  Inc(FPos, 2);

  FillChar(FLine[FPos], ASize, #0);
  CopyMemory(@FLine[FPos], PChar(AValor), Tam);

  Inc(FPos, ASize);
end;

{ Not everytime the next field begins right in the end of the last;
  eventually, some bytes must be nulled between two fields.
  This function will be responsible for that... }

procedure TIBRecord.CheckOffset(ANextSize: Integer; AChar: Boolean);
begin
  if ((FPos and 3) <> 1) then
  begin
    if (((FPos + ANextSize) and 3) <> 1) then
    begin
      if (ANextSize >= 4) then
      begin
        if not AChar and ((ANextSize = 4) or ((ANextSize and 7) = 0)) then
        begin
          while ((FPos and 3) <> 1) do // (Múltiplos de 4) + 1
            Inc(FPos);
        end
        else
          while ((FPos and 1) <> 1) do // (Múltiplos de 2) + 1
            Inc(FPos);
      end;
    end;
  end;
end;

constructor TIBRecord.Create;
begin
  Reset;
end;

function TIBRecord.GetLine: String;
begin
  SetLength(Result, Pred(FPos));
  CopyMemory(@Result[1], @FLine, Pred(FPos));
end;

procedure TIBRecord.Reset;
begin
  FPos := 1;
  ZeroMemory(@FLine, Length(FLine));
end;