Data Replication With InterBase
by Matt Hopkins, Dunstan Thomas (UK) Ltd. Borland Developers Conference 1998
This paper will describe the basic concepts of data replication and how they can be implemented relatively simply using the built-in features of InterBase.
What is Data Replication?
Data replication is the copying of information to one or more databases in such a way that the information is consistent across all sites.
There are two basic types of replication - synchronous and asynchronous
With synchronous replication, all copies or replicates of data are kept exactly synchronized and consistent. If any copy is updated the changes will be immediately applied to all other databases within the same transaction. Synchronous replication is appropriate when this exact consistency is important to the business application.
With Asynchronous, or store and forward replication, copies of data will become temporarily out of sync with each other. If one copy is updated, the change will be propagated and applied to the other copies as a second step, within separate transactions that may occur seconds, minutes, hours or even days later. Copies therefore can be temporarily out of sync, but over time the data should converge to the same values at all sites.
This paper will deal exclusively with asynchronous data replication.
What are the Necessary Components for Data Replication?
Capturing the Changes to the Source Database
All data replication strategies require a method for capturing changes in a replicated database. There are two main mechanisms - transaction logs and triggers.
The transaction log approach utilizes a technique called "log sniffing" which copies transactions flagged for replication to a staging area for transmission. This technique is argued to have the least impact on the database server because it requires little CPU overhead when reading from the in-memory log and writing to disk. It is implemented in replication products from vendors such as Informix, Sybase, and MS SQL/Server.
The second approach is to use database triggers to propagate the changes when they occur into a log database table for transmission. As the database procedural language is at your disposal for this method, it provides more flexibility in deciding what data is to be replicated. It is implemented in replication products from vendors such as Ingres and Oracle. It will also be the technique we utilize when replicating with InterBase.
Transmit Changes from the Source Database to All Target Databases
Transmitting changes for replication requires software which reads the changes log and then connects to all target database servers located either on a LAN or WAN and applies the changes to the target database.
This software is known as a replication manager and must also handle error logs and update conflicts.
How can these Data Replication Components be Implemented in InterBase?
Logging the Data Changes
Okay, in order to create our replication engine we will first need to create two sample databases (source and target) with two identical tables in each one:
CREATE DATABASE "source.gdb" PAGE_SIZE 1024
CREATE TABLE TEAMS (
TEAM_ID INTEGER NOT NULL,
TEAM_NAME VARCHAR(100),
TEAM_MGR VARCHAR(40),
PRIMARY KEY (TEAM_ID));
CREATE TABLE CHANGES (
CHANGE_ID INTEGER NOT NULL,
CHANGE_TYPE CHAR(1) NOT NULL,
SOURCE_KEY_FIELD VARCHAR(31) NOT NULL,
SOURCE_KEY INTEGER NOT NULL,
SOURCE_TABLE VARCHAR(31),
USERNAME VARCHAR(31),
PRIMARY KEY (CHANGE_ID));
CREATE GENERATOR NEW_TEAM_ID;
CREATE GENERATOR NEW_CHANGE_ID;
Now let's add a trigger to the CHANGES table to get a unique index value from the InterBase generators:
CREATE TRIGGER CHANGES_NEWID FOR CHANGES
BEFORE INSERT
AS
BEGIN
NEW.CHANGE_ID = GEN_ID(NEW_CHANGE_ID,1);
END
Note
Note that we are not adding a trigger to generate a unique ID to the TEAMS table; the reason why is discussed in the next section.
Now that we have our sample database, we need to create a mechanism for logging all changes to our table.
As mentioned before, we will be incorporating the trigger method for logging changes to our table for replication. The benefits are that they are available, they are easy to use, and unlike transaction logs they can include custom programming logic via the InterBase procedural language.
As there are three types of changes that can occur to a table - Insert, Update, and Delete, we will need three triggers for each table being replicated:
/* After an Insert */
CREATE TRIGGER TEAMS_REPL_INSERT FOR TEAMS
AFTER INSERT
AS
BEGIN
INSERT INTO CHANGES
(CHANGE_TYPE, SOURCE_KEY_FIELD,
SOURCE_KEY, SOURCE_TABLE,USERNAME)
VALUES
("I","TEAM_ID",NEW.TEAM_ID,"TEAMS",USER);
END;
/* After an Update */
CREATE TRIGGER TEAMS_REPL_UPDATE FOR TEAMS
AFTER UPDATE
AS
BEGIN
INSERT INTO CHANGES
(CHANGE_TYPE, SOURCE_KEY_FIELD,
SOURCE_KEY, SOURCE_TABLE,USERNAME)
VALUES
("U","TEAM_ID",NEW.TEAM_ID,"TEAMS",USER);
END;
/* After a Delete */
CREATE TRIGGER TEAMS_REPL_DELETE FOR TEAMS
AFTER DELETE
AS
BEGIN
INSERT INTO CHANGES
(CHANGE_TYPE, SOURCE_KEY_FIELD,
SOURCE_KEY, SOURCE_TABLE,USERNAME)
VALUES
("D","TEAM_ID",OLD.TEAM_ID,"TEAMS",USER);
END;
That's everything that's needed for logging data changes for replication in InterBase.
Notice that for the sake of simplicity we have required that replicated tables have a single unique integer key and that this is the only data element actually stored in the CHANGES table. We could store all the field values that changed, but as we will be replicating a snapshot of the data by storing just the key and the action, we can get the current values in each table with the replication engine.
One more step before we move onto to the replication engine - we need to create a destination database with all of the same metadata. This can be done by backing up the source database with the metadata-only flag and restoring it to "destination.gdb."
Replicating the Data Changes
Now we are going to create a replication server using 32-bit Delphi.
The idea is that a user can set the replication for a timed cycle or they could manually invoke it by pressing the "Replicate Now" button. A log of all activity is displayed in the "Log" memo and any exceptions that occur are displayed in the "Errors" memo.
The source alias and target alias are passed as command line parameters like this:
C:REPLD SOURCEDB TARGETDB
The project source for ensuring that there are always two parameters looks something like this:
if (ParamStr(1) = '') or (ParamStr(2) = '') then
MessageDlg('Usage: "REPLD.EXE SOURCE_ALIAS TARGET_ALIAS"',mtError,[mbOK],0)
else
begin
Application.Initialize;
Application.CreateForm(TForm1, Form1);
Application.Run;
end;
All replication is handled in a "Replicate" procedure which will look like this:
procedure TForm1.Replicate;
var
qryChangeList : TQuery;
strChangeType : String;
begin
qryChangeList := TQuery.create(Application);
try
with qryChangeList do
begin
DatabaseName := 'SourceDB';
sql.add('select * from changes');
open;
while not eof do
begin
if (fieldByName('CHANGE_TYPE').asString = 'I') then
begin
strChangeType := 'Insert';
ReplicateInsert(
fieldByName('SOURCE_TABLE').asString,fieldByName(
'SOURCE_KEY_FIELD').asString,fieldByName('SOURCE_KEY').asInteger)
end
else if (fieldByName('CHANGE_TYPE').asString = 'U') then
begin
strChangeType := 'Update';
ReplicateUpdate(
fieldByName('SOURCE_TABLE').asString,fieldByName(
'SOURCE_KEY_FIELD').asString,fieldByName('SOURCE_KEY').asInteger)
end
else if (fieldByName('CHANGE_TYPE').asString = 'D') then
begin
strChangeType := 'Delete';
ReplicateDelete(
fieldByName('SOURCE_TABLE').asString,fieldByName(
'SOURCE_KEY_FIELD').asString,fieldByName('SOURCE_KEY').asInteger)
end;
memoLog.lines.add( DateTimeToStr(Now)+' '+strChangeType+
' '+fieldByName('SOURCE_KEY').asString+
' on '+fieldByName('SOURCE_TABLE').asString);
DeleteFromChanges(fieldByName('CHANGE_ID').asInteger);
next;
end;
close;
end;
finally
qryChangeList.free;
end;
end;
And for each action (Insert, Delete, Update) a separate procedure is called which should look like this:
Replicate Inserts:
procedure TForm1.ReplicateInsert(
const strTableName, strKeyField: String; const iKey: Integer);
var
qrySource, qryTarget : TQuery;
i : SmallInt;
begin
qrySource := TQuery.create(Application);
try
with qrySource do
begin
DatabaseName := 'SourceDB';
with sql do
begin
add('select * from '+strTableName);
add(' where ('+strKeyField+' = '+IntToStr(iKey)+');');
end;
try
open;
while not eof do
begin
qryTarget := TQuery.create(Application);
try
with qryTarget do
begin
DatabaseName := 'TargetDB';
with sql do
begin
add('insert into '+strTableName);
add('(');
for i := 0 to qrySource.fieldCount-1 do
begin
if (i < qrySource.fieldCount-1) then
add(' '+qrySource.Fields[i].FieldName+',')
else
add(' '+qrySource.Fields[i].FieldName)
end;
add(')');
add('values');
add('(');
for i := 0 to qrySource.fieldCount-1 do
begin
if (i < qrySource.fieldCount-1) then
add(' :'+qrySource.Fields[i].FieldName+',')
else
add(' :'+qrySource.Fields[i].FieldName)
end;
add(')');
end;
prepare;
for i := 0 to qrySource.fieldCount-1 do
begin
qryTarget.Params[i].asString := qrySource.Fields[i].asString;
end;
execSQL;
end;
finally
qryTarget.free;
end;
next;
end;
close;
except
on e: Exception do
memoErrors.lines.add(DateTimeToStr(Now)+' '+e.message);
end;
end;
finally
qrySource.free;
end;
end;
Replicate Upates:
procedure TForm1.ReplicateUpdate(
const strTableName, strKeyField: String; const iKey: Integer);
var
qrySource, qryTarget : TQuery;
i : SmallInt;
begin
qrySource := TQuery.create(Application);
try
with qrySource do
begin
DatabaseName := 'SourceDB';
with sql do
begin
add('select * from '+strTableName);
add(' where ('+strKeyField+' = '+IntToStr(iKey)+');');
end;
try
open;
while not eof do
begin
qryTarget := TQuery.create(Application);
try
with qryTarget do
begin
DatabaseName := 'TargetDB';
with sql do
begin
add('update '+strTableName);
add('set');
for i := 0 to qrySource.fieldCount-1 do
begin
if (i < qrySource.fieldCount-1) then
add(' '+qrySource.Fields[i].FieldName+
' = :'+qrySource.Fields[i].FieldName+',')
else
add(' '+qrySource.Fields[i].FieldName+
' = :'+qrySource.Fields[i].FieldName)
end;
add(' where ('+strKeyField+' = '+IntToStr(iKey)+');');
end;
prepare;
for i := 0 to qrySource.fieldCount-1 do
begin
qryTarget.Params[i].asString := qrySource.Fields[i].asString;
end;
execSQL;
end;
finally
qryTarget.free;
end;
next;
end;
close;
except
on e: Exception do
memoErrors.lines.add(DateTimeToStr(Now)+' '+e.message);
end;
end;
finally
qrySource.free;
end;
end;
Replicate Deletes:
procedure TForm1.ReplicateDelete(
const strTableName, strKeyField: String; const iKey: Integer);
var
qrySource : TQuery;
i : SmallInt;
begin
qrySource := TQuery.create(Application);
try
with qrySource do
begin
DatabaseName := 'TargetDB';
sql.add('delete from '+strTableName);
sql.add(' where ('+strKeyField+' = '+IntToStr(iKey)+')');
execSQL;
try
execSQL;
except
on e: Exception do
memoErrors.lines.add(DateTimeToStr(Now)+' '+e.message);
end;
end;
finally
qrySource.free;
end;
end;
Note
Note that this example does not support BLOB types. Additional code would need to be added to add this functionality, but it is still possible.
Now, link the SpinBox value to the timer so that you adjust the replication cycle:
procedure TForm1.SpinEdit1Change(Sender: TObject);
begin
Timer1.Interval := (SpinEdit1.Value * 1000);
end;
Run it, and while REPLD is running, make some changes to the TEAMS table in the SOURCE database making sure to add a unique key value and then check the TARGET database (after a replication cycle) to see the changes replicate.
Bi-directional Replication
While in the TARGET database, check the CHANGES table. You should notice that the replicated changes from the SOURCE database also appear in this table.
The problem is that the mechanism for logging our changes is not distinguishing between those changes made by an end-user and those made by the replication daemon. If we were to initiate bi-directional replication with REPLD without specifying a "replication user" we would create an infinite loop.
Therefore, we will need to designate a "replication user" - for this example well call it "REPLICATE" - on each InterBase server on which we will be running and refuse to log changes which that "user" makes. This will mean that we need to alter the change logging triggers for each table needing replication as follows:
/* After an Insert */
ALTER TRIGGER TEAMS_REPL_INSERT FOR TEAMS
AFTER INSERT
AS
BEGIN
IF (USER <> "REPLICATE") THEN
INSERT INTO CHANGES
(CHANGE_TYPE, SOURCE_KEY_FIELD,
SOURCE_KEY, SOURCE_TABLE,USERNAME)
VALUES
("I","TEAM_ID",NEW.TEAM_ID,"TEAMS",USER);
END;
/* After an Update */
ALTER TRIGGER TEAMS_REPL_UPDATE FOR TEAMS
AFTER UPDATE
AS
BEGIN
IF (USER <> "REPLICATE") THEN
INSERT INTO CHANGES
(CHANGE_TYPE, SOURCE_KEY_FIELD,
SOURCE_KEY, SOURCE_TABLE,USERNAME)
VALUES
("U","TEAM_ID",NEW.TEAM_ID,"TEAMS",USER);
END;
/* After a Delete */
ALTER TRIGGER TEAMS_REPL_DELETE FOR TEAMS
AFTER DELETE
AS
BEGIN
IF (USER <> "REPLICATE") THEN
INSERT INTO CHANGES
(CHANGE_TYPE, SOURCE_KEY_FIELD,
SOURCE_KEY, SOURCE_TABLE,USERNAME)
VALUES
("D","TEAM_ID",OLD.TEAM_ID,"TEAMS",USER);
END;
Now we will run two versions of REPLD:
C:REPLD SOURCEDB TARGETDB C:REPLD TARGETDB SOURCEDB
And when we are asked to log in, we need to log in as REPLICATE. This will now provide us with bi-directional replication.
You must note that when you move from the master-slave replication model to a peer-to-peer model several complexities arise which must be considered before implementing replication. These issues and how to deal with some of them are covered in the next section.
Replicating as Changes Occur (near-synchronous replication)
If we needed to replicate tables as they are changed and not rely on a timed "polling" cycle, we can utilise InterBases event alerter mechanism. This would require two steps - generating (or posting) an event and then responding to an event.
To generate the event, we need to add a new trigger to the CHANGES table:
CREATE TRIGGER CHANGES_ALERT FOR TEAMS
AFTER INSERT
AS
BEGIN
POST_EVENT "NEW_CHANGE";
END;
To respond to the event, we will need to add an IBEventAlerter component to our REPLD application, register an interest in the "NEW_CHANGE" event by adding it into the Events property, and then link the OnEventAlert event to the replicate procedure:
procedure TForm1.IBEventAlerter1EventAlert(Sender: TObject;
EventName: string; EventCount: Longint; var CancelAlerts: Boolean);
begin
Replicate;
end;
Try it.
Replicating Compound Key Tables
For simplicity, in the above example we have been replicating single [integer] key tables only. The same technique, slightly modified, can be used to handle compound key tables.
The main difference between handling compound mixed-type key and single integer key tables is in the way changes are logged. Rather than a single CHANGES table, with compound keys a "log" table for each base table we want to replicate is required.
Replicating across a WAN with RAS
Remote Access Services (RAS) on Windows NT and Dial-up Networking in Windows 95 allows most of the services which would be available on a network to be accessed over a modem link. This means that in order to modify our replication engine so that it replicates remote sites via a modem, we need to add support for RAS.
The easiest way in Delphi to implement RAS is via a component. There are quite a few available in both shareware and freeware forms on the Internet. For this example, we will use TRas from Daniel Polistchuck which is available at www.delphi32.com.
As the component and the RAS API handles all of the complicated stuff for us, adding RAS to REPLD will only take a few lines of code.
First we will need to modify the "Replicate" procedure so that it will dial a RAS connection whenever there are changes to replicate.
procedure TForm1.Replicate;
begin
with qryChangeList do
begin
DatabaseName := 'SourceDB';
sql.add('select * from changes');
if active then close;
open;
if (recordCount > 0) then
begin
memoLog.lines.add(DateTimeToStr(Now)+' Dialing DT');
RAS1.EntryName := 'DT';
RAS1.Connect;
end;
end;
end;
As you can see, we have added a test to ensure that records exist and the call to the RAS component to dial a connection called "DT".
We have also moved the rest of what previously existed in this procedure into the "OnConnect" event of the RAS component and made the change query (qryChangeList) global to the form:
procedure TForm1.RAS1Connect(Sender: TObject);
var
strChangeType : String;
begin
with qryChangeList do
begin
memoLog.lines.add(DateTimeToStr(Now)+' Connected to DT');
try
while not eof do
begin
if (fieldByName('CHANGE_TYPE').asString = 'I') then
begin
strChangeType := 'Insert';
ReplicateInsert(
fieldByName('SOURCE_TABLE').asString,fieldByName(
'SOURCE_KEY_FIELD').asString,fieldByName('SOURCE_KEY').asInteger)
end
else if (fieldByName('CHANGE_TYPE').asString = 'U') then
begin
strChangeType := 'Update';
ReplicateUpdate(
fieldByName('SOURCE_TABLE').asString,fieldByName(
'SOURCE_KEY_FIELD').asString,fieldByName('SOURCE_KEY').asInteger)
end
else if (fieldByName('CHANGE_TYPE').asString = 'D') then
begin
strChangeType := 'Delete';
ReplicateDelete(
fieldByName('SOURCE_TABLE').asString,fieldByName(
'SOURCE_KEY_FIELD').asString,fieldByName('SOURCE_KEY').asInteger)
end;
memoLog.lines.add( DateTimeToStr(Now)+' '+strChangeType+' '+
fieldByName('SOURCE_KEY').asString+
' on '+fieldByName('SOURCE_TABLE').asString);
DeleteFromChanges(fieldByName('CHANGE_ID').asInteger);
next;
end;
close;
finally
RAS1.Disconnect;
memoLog.lines.add(DateTimeToStr(Now)+' Disconnected from DT');
end;
end;
end;
This is all that is needed to connect to a remote site, replicate any changes, and then disconnect.
Is it that simple?
No. With replication comes a myriad of problems that need to be avoided through good design, adequate planning and sometimes a bit of software. Here are just a few:
Update Conflicts
Ensuring convergence in asynchronous replication environments is critical for nearly every application. However, what happens if the same data element (e.g. the same column in the same row) is updated at two sites at the same time, or to be more precise - within the same replication interval?
This is known as an update conflict. To ensure convergence, update conflicts must be detected and resolved so that the data element has the same value at every site. The more frequently you propagate your changes (that is the smaller your replication interval or lag time), the fewer update conflicts will occur.
Alternatively, update conflicts may be avoided by limiting "ownership" or the right to update a given data element to a single site. In fact, many believe that conflict resolution is a process issue rather than one for software developers. By eliminating the possibility for conflicts through design and procedure, the software would not need to resolve update conflicts as they will, in theory, never occur.
This view, however, is somewhat limited when customers are demanding that various options for conflict resolution be built into the data replication tool.
Unique Keys & Generators
Typically when working with unique integer keys, a before insert trigger is added which calls a number generator to add new values. When working with replication this can cause a problem.
The problem is in keeping two or more databases synchronized. Lets look at the following example:
At site one, the generator used for a certain table is set at 5. When a new record is added, a trigger increments the generator value by 1 and assigns 6 to the key field of the new record.
This record is replicated to an identical database at site two.
At site two, the generator value is set to 100. When site ones record is inserted into the corresponding table, a trigger replaces the key value of 6 with the newly incremented value of 101.
The solution has two parts. First is to rely on the client to assign the key value and not use triggers. Second, to assign blocks of numbers to the various sites so that each location "owns" certain key values. An InterBase integer can store up to 2 billion numbers so the blocks can be divided in sets of a million if needed. The generators at each site can be initialized with the starting number of each block by using the SET GENERATOR command:
SET GENERATOR GEN_NAME TO 1000000;
Initialization
If you replicate source data into targets, then you must initialize those targets so that they start off in sync with the source. Typically, you can do this with a full copy of the target data; simply implement a dump of the source data and reload into the target.
DDL Changes
What happens when you change the metadata? If you add a field to a table at one site, this change will also need to be made at all replicated sites.
Summary
Database replication is a hot topic and is being requested (or rather demanded) by more and more clients all over the world; in fact, more than fifty percent of projects started since December 1996 at Dunstan Thomas have had replication as a system requirement.
Replication has the potential to provide better throughput, faster response times and reduced communication costs and network bottlenecks to geographically dispersed end-users.
With InterBase and the power of Microsofts RAS, data replication is not only possible but available today.