Fulltext Search for Firebird using DotLucene
by Dan Letecky
This article first appeared on DotNetFirebird
Introduction
In this article, we will talk about searching the data in a Firebird database using DotLucene full-text search engine. We will focus on storing the index directly in the database (the source code of the solution is attached).
Firebird SQL
Firebird SQL excellent embedded database with a good .NET support) has no built-in fulltext search support so far. Instead, you need to rely on third party tools. Fortunately, there is a great search engine library available: DotLucene. It is an open-source .NET library (ported from Java) that can index any data (structured or unstructured) that you are able to convert to raw text.
Using an additional library for fulltext doesn't look too elegant at first sight. However, it has some advantages. Let's compare it quickly with MySQL integrated fulltext search:
MySQL Fulltext Search
MySQL fulltext search has these drawbacks (compared to DotLucene):
- You can use it only in MyISAM tables (i.e. no transactions)
- You can't browse the index (see Luke)
- You need to store transformed text in the database (i.e. for indexing HTML, you need to store another copy of the text with stripped HTML tags)
- It doesn't support highlighting of the query words in the result
- You will hardly modify the sources to do custom changes
- The license doesn't allow to use it in commercial application for free
- It is reported to be slow on large data sets
How to Index the Data?
For basics about using DotLucene to index your data, I recommend reading:
- DotLucene: Full-Text Search for Your Intranet or Website Using 37 Lines of Code (CodeProject Article)
- DotLucene Tutorial
- DotLucene Online Demo
The following applies for indexing the database:
- You are using a different source of data (obviously ;-). Instead of reading from the disk, you need to load it from the database.
- When indexing texts, you don't need to it in the index in full, just keep them in the database.
- Create an additional Field that will contain the primary key of the indexed document (so you can later load it from the database).
- When indexing HTML, you need to strip the HTML tags (you need to supply raw text to DotLucene).
Where to Store the Index
On a server, it's no problem to store the index in a separate directory (you can also load it to RAM to make your searches super fast - if you have enough RAM, of course). In a desktop application, it might be useful to store the index in a Firebird database.
DotLucene supports a mechanism for adding custom index storages. All storage types (file system and RAM are built-in) are implemented as a class derived from Lucene.Net.Store.Directory abstract class. I have created a Directory implementation that stores the index directly in a Firebird SQL database.
FbDirectory class
All index reading/writing operations in DotLucene are done using a Directory class. The new FbDirectory class is based on FSDirectory. The filesystem operations are replaced with database operations. Here you can see what we need to implement:
using System;
namespace Lucene.Net.Store
{
public abstract class Directory
{
public abstract String[] List();
public abstract bool FileExists(String name);
public abstract long FileModified(String name);
public abstract void TouchFile(String name);
public abstract void DeleteFile(String name);
public abstract void RenameFile(String from, String to);
public abstract long FileLength(String name);
public abstract OutputStream CreateFile(String name);
public abstract InputStream OpenFile(String name);
public abstract Lock MakeLock(String name);
public abstract void Close();
}
}
Performance Tips:
- If performance is your main concern, use the standard FSDirectory instead to store the index on disk. My tests show that database storage is twice as slow than the filesystem. Use the database storage only when you have no other choice.
- Use compound index format IndexWriter.SetUseCompoundFile(true);). This is default in DotLucene 1.4 but in 1.3 you have to do it manually.
- Create the index in memory, optimize, then save it on disk using FbDirectory.Copy(); This will only help you if you are rebuilding the whole index from scratch.
- If you are adding a document to the index from a desktop application, do it in background (in a separate thread). You are still able to search while you are adding a new document.