Exception Handling - Functional Specification

Description

It has become self-evident that it is necessary to protect the engine from badly behaved code in either user functions, as well as in the engine itself. As customers migrate from Classic server to super server they are noticing one of the fundamental differences. That is, in the classic model, if the server crashed, only the user who caused the crash was disconnected, while in Super Server, all users are dropped, and the server core dumps/GPFs.

The real problem is that when the user writes a badly behaved UDF, and it crashes the server, they end up thinking that our server is NOT stable. Using C++ language exception handling we are able to trap for hardware exception such as SEGV / Access Violation. By doing this we will be able to notify the user, by using the interbase log file, that the location of the crash was in their code, and not in the engine. We can not exit gracefully from this, since it is possible that our stack is now corrupt, and this unstable. Therefore the only recourse we have left is to abort.

User Interface/Usability

In UNIX we have the ability to define special signal handlers for thread specific signals such as SIGSEGV, SIGFPE, SIGILL, SIGBUS. In Windows we can trap for many different exceptions such as ACCESS_VIOLATION, STACK_OVERFLOW, ARRAY_BOUNDS_EXCEEDED, DATATYPE_MISALIGNMENT, FLT_DENORMAL_OPERAN, FLT_DIVIDE_BY_ZERO, FLT_INEXACT_RESULT, FLT_INVALID_OPERATION, FLT_OVERFLOW, FLT_STACK_CHECK, FLT_UNDERFLOW, INT_DIVIDE_BY_ZERO, and INT_OVERFLOW. Using signal handlers on UNIX, and try-except blocks on WIndows we are able to trap any of these signals/exceptions.

However, the question we are now faced with is: Once we know that someone has done something VERY bad, what do we do about it?

We have determined that it is never 100% safe to recover from an exception. It is always possible that before the exception was detected some part of memory, however small, was corrupted. Therefore we will only put in place a mechanism for detecting the exception, logging it, and finally aborting. This serves a dual purpose.

  1. If the exception came from user code, they will be able to determine exactly which one of their functions caused the problem, and the exact nature of it. This will assist them in rectifying the situation, or at the very least it will indicate to them that it was their code which forced the server to shut down.
  2. If the exception was within the engine, then we will get a clue as to the nature of the problem in the log file, and we will still have a core file (at least on UNIX) to use for debugging purposes.

Requirements and Constraints

The constraint for this feature is that it is still possible that the server will crash. We are not implementing this to prevent crashes. This only allows us to identify that the user may have caused the crash, if indeed that is the case. This behavior may not be considered too friendly by most users, however as discussed above this is the ONLY thing we believe we can currently do safely.

Migration Issues

As far as user code is concerned, this marks the third change to their behavior since InterBase 5.0. In InterBase 5.0, and earlier, if there was a signal/exception in user code we would crash. In InterBase 5.5, we added the necessary signal/exception handling to trap this situation, report it to the user, and continue executing. This looks very good to users, but it was determined that it was too risky to continue executing without knowing the extent of the damage done by the signal/exception. Therefore, we are now proposing to trap the signal/exception, log it to our log file, and abort execution of the server. This is not as well behaved as InterBase 5.5, but it is believed to be the safest approach.