Database optimization with FBOpt
FBOpt is a command-line tool for optimization of Firebird databases created by IBPhoenix, and distributed on IBPhoenix Developer DVD starting from 29th release.
Internal database structure
Internal database structure plays important role in Firebird, because it's efficiency is a foundation for performance of the engine. The layout of internal structures has direct impact on complexity and speed of operations needed for their processing (which translates to CPU and memory utilization), and data distribution has direct impact primarily on number of I/O operations and cache utilization. While layout of data structures is fixed (in code), data distribution is variable and depends on actual operations performed on database. In current Firebird implementations, engine pays only minimal attention to store and distribute data (both user and internal) in most effective way. It focuses primarily on speed of currently performed operation instead on overall balanced performance. This may lead to degradation of performance over time as internal database layout becames less and less effective. Eventually the drop in performance requires user's intervention, which under Firebird means rebuild of the database from logical backup (gbak).
However, restore from backup fixes only several problems, namely:
- Row fragmentation, so each row is stored on single page (if single row is not bigger than database page)
- Table and index fragmentation, so data and index pages are compacted to hold as much data as possible, and all pages that belong to single database table or index are stored consecutively.
It does not address next problems:
- Index clustering (jumps to different data pages while walking through index).
- Data density on data pages (all data pages could be either highly packed or have space reserve for updates).
- Physical location of user data and index structures (all user data are stored first in order of creation, then all indices).
- Prevention of data fragmentation.
FBOpt was designed and created to address all these problems.
Optimization with FBOpt
FBOpt is basically a database copy utility. The basic operation is similar to piping gbak backup to gbak restore via standard output, except that FBOpt does not bother to create intermediate backup data stream and can apply "tweaks" on copied data. Because FBOpt uses the same low-level BLR interface for communication with Firebird engine like gbak, it's fast at least like gbak. However, FBOpt does not perform some steps that are required by gbak but unnecessary in data copy utility, and also handles some operations more efficiently (especially handling big BLOBs and ARRAYs) - so it's often faster than gbak. The performance gain is usually around few percent, but could be significant (tens of percent) when database contains many huge BLOB values. But the real value is in additional operations that FBOpt can perform on newly created database.
FBOpt allows to:
- specify physical order of rows in individual tables.
- specify space reserve/no reserve for individual tables.
- specify physical order of tables in newly created database.
- specify clusters of tables and their indices, where index structures are stored together with their source tables.
Specifying order of rows significantly improves performance for queries that read data in range of key values or in given key order. If there is an index with the same key, it has highest possible "clustering factor" and there are no unnecessary jumps to different data pages while walking through index, so any performance loss when such index is used for returning data in key order (instead internal sort) is eliminated.
Specifying space reservation level for individual tables allows to achieve higher density of data in tables that does not change, while keeping necessary space reserve for tables that are frequently updated. High data density saves disk space and increases performance of queries due to decreased number of I/O operations.
Specifying physical order of tables in database allows storage of tables that do not change before tables that are updated or extended, so the volatile part of the database is at the end of file. Together with table clustering it's even possible to separate (mostly) static data from volatile ones which reduces build-up of fragmentation over time.