Firebird for the Database Expert: Episode 6 - Why can't I shrink my databases

by Ann Harrison

New Firebird users often ask "Why doesn't the database get smaller when I delete records?" or "Where's the PACK function".

The usual answer is that releasing and reallocating space is more expensive than reusing it internally. That’s true, but it’s not the whole answer. The real issue is the relationships between pages, and to understand that, it helps to have some understanding of the structure of a Firebird database.

There’s a more complete description here, but, briefly, a Firebird database is a single file. The file contains data for all tables, indexes, and structural information that allows Firebird to allocate and release pages, locate tables and indexes, maintain generators, etc.

The database file is made up of pages. Pages are fixed length blocks within the file. Each page has a specific function. The most common are data pages, each holds records for a single table. When you store a record in a table, Firebird first tries to store it on a data page for that file that are already in the page cache. Then, it looks for other pages belonging to the table that have space. If there is no data page for that table in the file with space for the new record, then Firebird looks for free pages - pages that have been allocated to the file that are not currently used. Finally, if all those searches fail to find a place for the record, Firebird extends the file and allocates a page in the new space.

/images/doc_23_1.png

This diagram represents a section at the end of a database file. The red pages are data pages. Brown pages are index pages. The purple page is a page inventory page. The two white pages represent former data pages that are now empty. The first page cannot be released because file systems do not allow space to be removed from the middle of a file. In theory, the last page in the database could be released, by truncating the file slightly. However, the effect would be minimal unless a large number of the deleted records were on the last pages allocated. That situation is rare.

One common case of mass deletes is an application step in which records must be stored in a temporary table for processing before being inserted into their final location. In that case, pages allocated for the temporary table would precede the pages allocated for the permanent table, making truncation impossible. Another case is a rolling archive: an active table holds records for a period of time, after which they are archived to a different table or database. In that case, the deleted records would be stored before the most recent records, again preventing significant truncation. In fact, it is difficult to think of an application that stores a large number of records, and then deletes them without storing or modifying other data, aside from test databases.

/images/doc_23_2.png

One might imagine that the database with empty pages could be compacted by sliding the pages together. That thought gravely underestimates the internal linkages in a Firebird database. Pointer pages, index root pages, transaction inventory pages, and generator pages are located through a table called RDB$PAGES which would have to be updated with their new location. Pointer pages are arrays of page numbers, all of which would need to be updated to reflect the new locations of pages containing data for the tables. And those are the easy cases.

Page inventory pages - the purple page in the diagrams - occur at fixed intervals and cannot be moved. A page inventory page is an array of bits that indicate whether the corresponding page is in use. Since the correspondence is by page number, page inventories would have to be updated to reflect the new location of the pages. Within a data page, records identify their back versions and fragments by page number. Because here is no pointer back from the fragment or back version, if a page containing a fragment is moved, the system would need to search the whole table to find the record that owns the fragment and fix its pointer. Indexes pages point to their left and right neighbors by page number, and upper levels reference lower levels by pages number. At the bottom level, the index indicates the location of records by page number. Moving data or index pages would invalidate the whole index.

/images/doc_23_3.png

To summarize, there is no simple way to release all free space in a database to the operating system because free pages do not typically congregate at the end of the database file. The internal structure of the database file is so complex that any effort to compact the file would require taking the database off line for longer than a backup and restore, with less satisfactory results.