Interview with Jim Starkey from InterBase World
Sunday, February 09 2003 by Marina Novikova.
"Computers were my first obsession; relational database became my second", said Jim Starkey in his interview to InterBase World.
Marina Novikova: What is your education? Where did you like to work most?
Jim Starkey: I started messing around with computers in about 10th grade. A local college had a government grant to teach FORTRAN to high school kiddies. From my first clunk on a keypunch (look it up on Google), I was hooked. My first machine was an IBM 7040 which filled a very large room and had all of 32K memory. I designed my first computer language, a simulated machine language and assembler, the next year when I was 15. I haven't stopped since.
My degree, however, is in pure mathematics. My formal computer science background consists of a single special projects course.
My first "real" job was on a research project to build a database machine for the ARPAnet, the precursor of the Internet at a place called Computer Corporation of America (big name for a 7 person company). While there, I ran into Codd's early papers defining the relational database. Pure, simple, and elegant once stripped by the ludicrous academic language. Computers were my first obsession; relational database became my second.
My second real job was with DEC. DEC was a great place to work. Near total anarchy. I put out a very successful product called Datatrieve, which, due to a stupid political ploy by a manager, got cancelled. It didn't mean a thing. I got to write a monthly report saying "Problems: The project has been cancelled. If this isn't rectified, it could affect the schedule." It really didn't make any difference. The second version shipped on schedule, still cancelled. You've got to admire a company that succeeds despite the best efforts of its management.
But when Gordon Bell left DEC, it was time to go.
Marina Novikova: You are involved in a very interesting project Netfrastructure. Is it right, that Netfrasturcture is an SMP-optimized SQL-server with SuperServer architecture and built-in Java support?
Jim Starkey: The essential idea behind Netfrastructure is separation of content, presentation, and logic. Content lives in a "content store", a JDBC compliant, multi-user, SQL-based, transaction-based "content store." We don't call it a database because I hate the database market. So it's a "content store". Got it? Content store. Presentation is handled by a really nifty page generation engine and logic by an integrated Java Virtual Machine. All components share a common role-based security scheme.
The database (er, content store) architecture is radically different from Interbase/Firebird, reflecting the radical changes in computing platforms. I designed and implemented the first version of Interbase on an Apollo DN320, which was fast, cost a bundle, and maxed out at 2 megabytes. Now, the cheapest machine at the local mall is 128 MB and a couple of hundred bucks more takes it up to a gigabyte. If you're smart, you don't use a gigabyte of real memory the same way you use 2 MB. The engine is a single process, multi-threaded, and interlocked for SMP.
Another interest aspect of Netfrastructure is that application topology is inverted. The application runs inside the database engine, something like stored procedures on steroids. So a JDBC method invocation is a half dozen machine instructions rather than a thread switch, a context switch, a trip down the protocol stack, a trip up the protocol stack, a call from a server layer to the database engine, validating and probably translating arguments at each step, and then back. So we're talking the difference between a couple of dozen nanoseconds and probably 10 milliseconds. That's a lot of powers of 10. And that's before any data is moved. And, of course, to make this work in a security intensive environment, the application language has to play nicely in a sand box, hence Java. And yes, I did write my own Java Virtual Machine.
Marina Novikova: People say that you helped to implement ideas of multi-generational architecture originally used in Interbase/Firebird and realized them in Netfrastructure from scratch. In other words you clearly demonstrated to the next generation of Firebird developers how one should do small and very fast servers. Do you agree with these words?
Jim Starkey: I didn't help implement the idea of multi-general architecture, I invented it. But that was at DEC, before I started Interbase. DEC didn't want to run with it, so I did it by myself.
The big difference between Netfrastructure and Interbase/Firebird is that Netfrastructure is multi-generational in memory but not on disk, while Interbase/Firebird is carries generations on disk. There are two main reasons for the change. First, memory is cheap and plentiful. Second, Interbase was designed for clusters that didn't share memory, so disk was the only way to go. And, while it's nice that Netfrastructure is at least 10 times faster than Firebird, the really important thing is that I don't have to explain about sweep.
Marina Novikova: Does, in your opinion, a multi-generational architecture of Interbase/Firebird hinder the server from development? If there are too many updates, the garbage collection becomes slower and unpredictable.
Jim Starkey: Not at all. If you compare the disk write economy of InteBase/Firebird with the traditional transaction log, multi-generational wins hands down - taken as an aggregate it has lower disk overhead. There are things that blockheads can do to create bottlenecks, but Interbase/Firebird provides fewer of these than competing systems.
Marina Novikova: Do you think that Java is the best choice for developers of database web-applications?
Jim Starkey: With the right architecture, absolutely. Java is a superb language, small, simple, and robust, but it isn't for everything. It is a very good programming language for expressing application semantics - small, simple, elegant, and robust. The sandbox execution model allows it to run where application code normally doesn't belong, like the innards of a database (er, content store) system. On the other hand, the performance of Java string handling is dreadful and the thread synchronization primitives, two state mutexes, are pitiful. Netfrastructure uses Java for application logic, leaving content management and page generation to C++, and the implementation sings.
Marina Novikova: Why are Interbase/Firebird indexes only uni-directional? Why are there used not balanced but simple trees for Interbase/Firebird indexes (as we can see in current versions with sources available)? Is it a restriction of Interbase/Firebird architecture?
Jim Starkey: It all goes back to a single personal principle: I hate tuning. Computers and software should be smart enough that they don't need people to tell them how to optimize themselves.
The issue was clustered versus non-clustered indexes. Traditional database implementations walk indexes bouncing between index pages and database pages. Developers learned very quickly that this effectively pessimizes disk activity. So they invented clustering so that the physical ordering of records corresponded to the index order. This lead to space management problems, overflow pages, backoff strategies, optimizer problems, and general befuddlement. People had to plan physical structure, logical structures, access path strategies, and index design. And when they inevitably got it wrong, the database guys blamed the users for bad design.
So I developed an alternative index technology, combining btrees with Datacomputer inversions. The general idea is very simple. Rather than bouncing between index and data pages, the index is scanned first, setting bits in a sparse bit vector to indicate selected records, then processing records in bit order, which is also physical order by disk. This has a number of big wins. First, all indexes behave like well turned clustered indexes. Second, index buckets aren't subject to two phase locking. Third, Boolean "and" and "or" operations can be performed on intermediate bitmaps at virtually no cost, eliminating the need for the optimizer to chose between alternative indexes.
So it doesn't matter in the least whether an index in ascending or descending. Indexes are scanned first and records fetched second. If you want your records ordered, it is ALWAYS faster to fetch records in physical order and sort them than rattle the disk arm with random accesses. And the more records involved, the more a sort wins.
The only time that index walking makes sense is when you ask for a million records and only want the first few. This, in fact, was the default case during dBase emulation. So we added smarts to the binary access language and optimizer to recognize the case and do index walking.
Interbase/Firebird are ordinary btrees, which are naturally balanced. I didn't write the code for recombination of emptying buckets because I thought that data demographics were usually stable, so space would be reused, and I had more important things to do. Deej, I believe, ran out of important things to do and added bucket recombination.
Marina Novikova: Is Interbase/Firebird a completely relational DBMS? After all, the transactions are not necessarily saved after each commit if the Forced Writes option is not active. The letter D in ACID means Durability, so if a transaction is committed, it surely will not be lost in case of small hardware fail. In Interbase/Firebird you have to do Forced Writes but this decreases speed of operations. What can you tell about this?
Jim Starkey: Yup. Interbase was designed to run on honest operating systems with reliable disks, but on Unix this meant a tradeoff between performance and absolute reliability. I don't like tradeoffs, but there it is. A serial write log would pretty much eliminate the tradeoff, but is impossible to implement in classic. Netfrastructure is architected for a serial write log. But Linux with battery backup never, ever crashes, so I don't lose a lot of sleep over my tardy implementation.
Marina Novikova: After the Interbase 6 code was opened, it turned out that there were many interesting things, which unfortunately had not been realized, for example, expression indexes, bi-directional cursors, XNET and others. Why do you think the developers had not completed these ideas? Was this because of architecture restrictions, lack of time or anything else? And what do you think about Open Source DBMS? Linux is on the rise now, but will open databases occupy considerable part of the DBMS market in future?
Jim Starkey: The Achilles heel of open source projects is decision making. Most depend on consensus, and consensus on innovation is very, very difficult. So open source projects tend to be standards driven. Linux, for better or worse, is based Posix and Unix. OK, they've been obsolete for 20 years, but hey, they work. SQL, on the other hand, isn't a standard but a theme. The actual standard is next to useless for any purpose, let along interoperability. Certainly nothing to organize a project around.
There are bunches of important and interesting problems that database (and content store) systems should address. For example, the basis of the web is search. What database systems support open context search? One, as far as I know. What databases automatically filter data based on user profiles? One, as far as I know, and it calls itself a content store.
The Firebird guys have been doing a good job fixing bugs, stabilizing the system, plugging holes, and smoothing over lumps. The system needed a good facelift, and now it has one. But I don't see them reclaiming Interbase's traditional role of world leader in database technology. We were the first database with heterogeneous connectivity, two phase commit, cascading triggers, user defined functions, event alters, blob filters, array support, bidirectional multi-vendor gateways, etc. But then I don't see any open source projects fostering innovation.
I know this is a little hard, but I never could have developed Interbase as an open source project. I'm an engineer, not a politician. I'm quite sure that I couldn't have built a consensus around a radical new database architecture. I believe in innovation, and innovation requires a consistent vision. I just don't see vision in open source. But I do know one thing. Unless Firebird separates the classic and superserver code bases, neither is going to go anywhere. Each is stifling the development of the other.
Marina Novikova: Are you satisfied with the lot of your brain-child? You have realized your dreams and ideas in Netfrastructure but how do you estimate development of Interbase and Firebird? Is it interesting for you to watch this and is it a pleasure for you to get to know about serious improvements such as changes in architecture, main bug fixes, expansion of DSQL, performance improvement, etc?
Jim Starkey: I've got a bunch of brain-childen and I love them all. VAX Datatrieve outlived the VAX, escaped the claws of Oracle, outlived DEC, outlived Compaq, and lives still as an HP product. I like that. PDP-11 Datatrieve, for reasons I can't fathom, is also still available, but that just scares me. It's interesting to watch developers get their arms around various parts of Firebird, but I keep waiting for someone to run with it. Maybe somebody will.
If I were actively involved in Firebird, this is what I'd do:
- Separate the Siamese-twins classic and superserver so each can have a life.
- Maintain BLR as a legacy interface while moving SQL into the engine where it belongs. Dump DSQL.
- Introduce a two level name space
- Implement intellectually defensible, useful security for superserver.
- Embed a Java Virtual Machine for triggers, stored procedures, and UDFs.
- Implement a serial write log for superserver; back off on careful write.
Marina Novikova: Do you think that some of your ideas in the original Interbase architecture were incorrectly understood or wrongly realized by the following developers?
Jim Starkey: Neither. The ideas and architecture were completely appropriate for the platforms available for the decade following its inception. If I had to make another database system for an Apollo DN320 I'd make it natively SQL with a multi-level name space but otherwise pretty much the same.
If I had to do all over on modern machines, I'd write Netfrastructure all over again. But this time I'd make system tables case insensitive. And listen to Ann about index keys.
Marina Novikova: What would you like to wish all Interbase/Firebird/Yaffil community?
Jim Starkey: Learn from the past; design for the future. It isn't 1984 any more. Deal with it.