Exception Handling - Design Specification

by Marco Romanini 15th Sep 1998

High Level Design

In order to be notified of any critical problems which occurred in our code, we need to signals on UNIX and exceptions on NT. In the case of UNIX we can set a signal handler at the time we launch a thread, and reset it to the default (SIG_DFL) at the end, and that handler will be active for the life of the thread. Once an abnormal situation occurs the OS will send a signal to the handler. This handler can attempt to remedy the situation, if possible, and either error out by way of a longjmp back in the stack, or it can reset the handler to the default, and return. In this case the OS will retry the offending operation, and most likely fail again, in which case the default handler will cause a core drop. This method is not nestable, since we currently do not have anywhere to store the old signal handler so as to allow us to reset it to the correct handler at the end.

For Windows we will use the C++ extension to the C language of try-except. This construct has three parts, the try block, the exception filter, and the except block. The try block is the section of code which is protected by the exception handling. If an exception occurs within that block, which includes any functions called from that block, the exception filter is called. The exception filter is a function which determines what exception happened, if it wishes to handle it, and what to do afterwards. If the filter returns EXCEPTION_CONTINUE_EXECUTION, it is telling the OS that the exception was handled, and that all should now be fine, in which case the OS will reexecute the offending instruction. If the filter returns EXCEPTION_CONTINUE_SEARCH, it is telling the OS that the filter will not handle the exception, and that it should be passed up the stack to the next higher exception filter. Note that there is always an outer most exception filter at the OS level which causes the program to terminate, and the Access Violation to be displayed. If the filter returns EXCEPTION_EXECUTE_HANDLER, it is telling the OS that this section of code can not recover, and that the stack should be unwound until the except block is found. At this point the exception block is executed, usually consisting of cleanup code, and logging actions, and execution resumes after the except block.

In our case we will not make use of the except block because of its unwinding nature, rather we will handle all exception in the exception filter.

High Level Algorithm

In InterBase we have a mechanism for detecting in which type of thread we are executing. This mechanism uses the THDD data structure, and the thdd_type member of that structure. It should be noted that for the current implementation of the exception handling we are only concerned with exception occurring in JRD (THDD_TYPE_TDBB.) Therefore, we do NOT need to know which type of thread caused a signal/exception. However if we wrote this kind of signal/exception handler it would not be very flexible, or expandable.

I propose that we implement a generic signal/exception handler, which will be called for all signals/exception, from all thread types. Within this handler we can determine the specific exception, and the thread type causing the exception, and use this information to determine if we wish to handle it, or not. Writing such a generic handler is only slightly more complex, however it makes it immensely easier to read, and to expand if in the future we determine that there is another signal/exception in a different thread type which we want to handle.

Data Structures/Class Hierarchy

We need to add a new element to the THDD structure to point to the error handling routine for that particular thdd type. The new element will be:

FPTR_VOID  thdd_error_poster;

Interfaces

No interface change is required, except for the addition of several new error messages. These error messages have already been added to 5.5 and are therefore available to 6.0.

Detailed Design

Detailed Algorithm

There are currently 6 different subsystems which use our threading model, and thus our THDD data structure, and these are:

  • THDD_TYPE_TGBL: used by backup/restore
  • THDD_TYPE_TSQL: used by DSQL
  • THDD_TYPE_TDBB: used in engine (JRD)
  • THDD_TYPE_TRDB: used in remote interface
  • THDD_TYPE_TDBA: used in DBA utility (GSTAT)
  • THDD_TYPE_TIDB: used by interprocess server (IPServer)

Using the thdd.thdd_type we are able to write a single generic signal/exception handler which can detect which subsystem caused the error, and handle it accordingly. This will be able to detect which subsystem had the signal/exception, and determine if it should be handled or not. This can be done with the following code in isc_sync.c

#ifdef UNIX
#ifdef _ANSI_PROTOTYPES_
ULONG ISC_exception_post (
    ULONG  sig_num,
    siginfo_t *siginfo,
    ucontext_t *uap)
#else
ULONG ISC_exception_post (sig_num, siginfo, uap)
    ULONG  sig_num;
    siginfo_t *siginfo;
    ucontext_t *uap;
#endif
{
/**************************************
 *
 * I S C _ e x c e p t i o n _ p o s t ( U N I X )
 *
 **************************************
 *
 * Functional description
 *     When we get a sync exception, formulate the error code
 * and do a ERR_post.
 *
 **************************************/
THDD thdd;

if (!SCH_thread_enter_check ())
    THREAD_ENTER;

/* we need to call PLATFORM_GET_THREAD_DATA because we are
   getting the GET_THREAD_DATA from jrd which does a lot
   of error checking in DEV tree to make sure that the
   tdbb is OK.  But if we get here we could have any one
   of the types of thdd, so we can not check the tdbb. */
thdd=(THDD) PLATFORM_GET_THREAD_DATA;

switch (thdd->thdd_type)
    {
    case THDD_TYPE_TDBB:
        switch (sig_num)
            {
            case SIGILL :
               if (siginfo->si_code==ILL_BADSTK)
                  {
                  gds__log("Stack overflow");
                  ERR_post(isc_exception_sigill,0);
                  break;
                  }
            case SIGSEGV:
               if ((TDBB)(thdd)->tdbb_in_user_code)
                  {
                  gds__log("Signal in user code: BAD UDF");
                  break;
                  }
            case SIGBUS :
            case SIGFPE :
            default :
               gds__log("Unexpected exception in JRD");
               break;
           }
   case THDD_TYPE_TGBL:
      gds__log("Unexpected exception in GBAK/Restore");
      break;
   case THDD_TYPE_TSQL:
      gds__log("Unexpected exception in DSQL");
      break;
   case THDD_TYPE_TRDB:
      gds__log("Unexpected exception in Remote Interface");
      break;
   case THDD_TYPE_TDBA:
      gds__log("Unexpected exception in DBA Utility");
      break;
   case THDD_TYPE_TIDB:
      gds__log("Unexpected exception in IPC");
      break;
   default:
      gds__log("Unexpected exception");
      break;
   }

if (SCH_thread_enter_check ())
    THREAD_EXIT;

ISC_sync_signal_reset();
return 0L;
}
#endif /* UNIX */

#ifdef WIN_NT
#ifdef _ANSI_PROTOTYPES_
ULONG ISC_exception_post (
    ULONG  except_code)
#else
ULONG ISC_exception_post (except_code)
    ULONG  except_code;
#endif
{
/**************************************
 *
 * I S C _ e x c e p t i o n _ p o s t ( N T )
 *
 **************************************
 *
 * Functional description
 *     When we got a sync exception, fomulate the error code
 * and do a ERR_post.
 *
 **************************************/
THDD thdd;

if (!SCH_thread_enter_check ())
    THREAD_ENTER;

/* we need to call PLATFORM_GET_THREAD_DATA because we are
   getting the GET_THREAD_DATA from jrd which does a lot
   of error checking in DEV tree to make sure that the
   tdbb is OK.  But if we get here we could have any one
   of the types of thdd, so we can not check the tdbb. */
thdd=(THDD) PLATFORM_GET_THREAD_DATA;


switch (thdd->thdd_type)
    {
    case THDD_TYPE_TDBB:
        switch (exception_code)
            {
            case EXCEPTION_STACK_OVERFLOW:
               gds__log("Stack overflow");
               ERR_post(isc_exception_sigill,0);
               break;
            case EXCEPTION_ACCESS_VIOLATION:
               if ((TDBB)(thdd)->tdbb_in_user_code)
                  {
                  gds__log("Signal in user code: BAD UDF");
                  break;
                  }
            case SIGBUS :
            case SIGFPE :
            default :
               gds__log("Unexpected exception in JRD");
               break;
           }
   case THDD_TYPE_TGBL:
      gds__log("Unexpected exception in GBAK/Restore");
      break;
   case THDD_TYPE_TSQL:
      gds__log("Unexpected exception in DSQL");
      break;
   case THDD_TYPE_TRDB:
      gds__log("Unexpected exception in Remote Interface");
      break;
   case THDD_TYPE_TDBA:
      gds__log("Unexpected exception in DBA Utility");
      break;
   case THDD_TYPE_TIDB:
      gds__log("Unexpected exception in IPC");
      break;
   default:
      gds__log("Unexpected exception");
      break;
   }

if (SCH_thread_enter_check ())
    THREAD_EXIT;

return EXCEPTION_CONTINUE_SEARCH;
}
#endif /* WINNT */

#ifdef SUPERSERVER
#ifdef UNIX
#ifdef _ANSI_PROTOTYPES_
void ISC_sync_signals_set ()
#else
void ISC_sync_signals_set ()
#endif
{
/**************************************
 *
 * I S C _ s y n c _ s i g n a l s _ s e t( U N I X )
 *
 **************************************
 *
 * Functional description
 * Set all the synchronous signals for a particular thread
 *
 **************************************/
struct sigaction *act;

act.sa_handler=null;
act.sa_sigaction=null;
act.sa_flags=SA_SIGINFO;

sigset (SIGILL, (void) ISC_exception_post);
sigaction (SIGILL,act,null);
sigset (SIGFPE, (void) ISC_exception_post);
sigaction (SIGFPE,act,null);
sigset (SIGBUS, (void) ISC_exception_post);
sigaction (SIGBUS,act,null);
sigset (SIGSEGV, (void) ISC_exception_post);
sigaction (SIGSEGV,act,null);
}

#ifdef _ANSI_PROTOTYPES_
void ISC_sync_signals_reset ()
#else
void ISC_sync_signals_reset ()
#endif
{
/**************************************
 *
 * I S C _ s y n c _ s i g n a l s _ r e s e t ( U N I X )
 *
 **************************************
 *
 * Functional description
 * Reset all the synchronous signals for a particular thread
 * to default.
 *
 **************************************/

sigset (SIGILL, SIG_DFL);
sigset (SIGFPE, SIG_DFL);
sigset (SIGBUS, SIG_DFL);
sigset (SIGSEGV, SIG_DFL);
}
#endif /* UNIX */
#endif /* SUPERSERVER */

ISC_exception_post() is our generic signal/exception handlers. It will be called by the operating system when a signal/exception occurs. Our interest in the signals/exception is registered through the use of the following macros placed in ibsetjmp.h.

#ifdef UNIX
#ifdef SUPERSERVER
/* note these can not be nested */
#define START_CHECK_FOR_EXCEPTIONS ISC_sync_signals_set();
#define END_CHECK_FOR_EXCEPTIONS        ISC_sync_signals_reset();
#endif /* SUPER_SERVER */
#endif /* UNIX */

#ifdef WIN_NT
#ifdef SUPERSERVER
#include <excpt.h>
#define START_CHECK_FOR_EXCEPTIONS __try {
#define  END_CHECK_FOR_EXCEPTIONS } __except ( ISC_exception_post(GetExceptionCode())) { }
#endif /* SUPER_SERVER */
#endif /* WIN_NT */

/* generic macros */
#ifndef START_CHECK_FOR_EXCEPTIONS
#define START_CHECK_FOR_EXCEPTIONS
#endif

#ifndef END_CHECK_FOR_EXCEPTIONS
#define  END_CHECK_FOR_EXCEPTIONS
#endif

Finally we modify thread() in server.c as follows:

#ifdef _ANSI_PROTOTYPES_
static int THREAD_ROUTINE thread (
    void  *flags)
#else
static int THREAD_ROUTINE thread (flags)
    void *flags;
#endif
{
/**************************************
 *
 * t h r e a d
 *
 **************************************
 *
 * Functional description
 * Execute requests in a happy loop.
 *
 **************************************/
REQ request, next, *req_ptr;
PORT port, parent_port;
SLONG value;
EVENT ptr;
SCHAR *thread;
USHORT inactive_count;
USHORT timedout_count;
struct trdb thd_context, *trdb;
JMP_BUF  env;

#ifdef SUPERSERVER
ISC_enter(); /* Setup floating point exception handler once and for all. */
#endif

#ifdef WIN_NT
if (!((SLONG) flags & SRVR_non_service))
    thread=CNTL_insert_thread();
#endif

inactive_count=0;
timedout_count=0;
THREAD_ENTER;

START_CHECK_FOR_EXCEPTIONS

for (;;)
    {
    .   /* the contents of the for loop are unchnaged and
    .    * are omitted for the sake of clarity. */
    .
    }

END_CHECK_FOR_EXCEPTIONS

THREAD_EXIT;

#ifdef WIN_NT
if (!((SLONG) flags & SRVR_non_service))
    CNTL_remove_thread (thread);
#endif

return 0;
}

This will protect all code in, or called by, thread() from signals/exceptions. Therefore if an ill-behaved UDF causes a SEGV/AccessViolation, the execution will stop immediately, and the signal/exception handler will be called.

Once we execute ISC_exception_post() we will determine what signal/excecption occurred, and LOG the the appropriate action. In the case of STACK_OVERFLOW, we will call ERR_post(). This will post the error to the status vector and ERR_punt() out. This will LONGJMP us right back to the most recent SETJMP. Once this process is started, we know that we will continue to cleanup, post error if necessary, and punt back to a higher level. This will continue until we reach the top of the subsystem. In this case, we will arrive in jrd.c where at last we would return an error. At this point the request has failed and we continue to return error until we arrive in server.c, which was the original caller, who will send_response() and finish.

It is important to note that all of the above code is already in place, and has been there for a very long time. We are only proposing to add ISC_exception_post() which would take advantage of the existing error handling and cleanup code.

If in the future, we determine that other exception in new thread types can be safely recovered, we will ONLY have to change ISC_exception_post() to behave accordingly. Assuming that the new thread type has been designed to handle errors, and cleanup its memory on exit, all will be fine.

Note: In order for the stack overflow checking to be of any use, we will probably want to remove the limitation of 750 recursions of stored procedures, seeing as this is one of the most likely causes of the stack overflow.

New/Affected Modules

The following modules need to be modified to clean up the SET_THREAD_DATA macro:

jrdcch.c
jrdidx.c
jrdmet.e
jrdrng.c
jrdsdw.c
jrdtra.c
jrdjrd.c
jrdjrd.h

The following modules need to be modified to add the SCH_thread_enter_check function which determines if we have to do a THREAD_ENTER.

jrdsch.c

The following functions need to be modified to remove existent nonfunctioning exception handling code:

jrdfun.e
jrdblf.e

Lastly, the following modules need to be modified to add sigset/try-except code:

remoteserver.c

Testing Considerations

We will be able to write ill-behaved UDFs and Blob filters to exercise one particular case of this exception handling. As for the stack overflow signal/exception, we in R&D will have to asses all the particular times when we can overflow our stack, and outline ways to make this happen.