Java UDF Functional Specification
Note
Whilst InterBase V6.0 was in beta, work was started on the implementation of a design for supporting Java UDF's in the next release of InterBase. This document contain the pertinent information related to this work.
Table of Contents
- Introduction
- Description
- User Interface/Usability
- Requirements and Constraints
- Migration Issues
- Open Issues
- Syntax Conventions
- Reference Documents
Introduction
The term native UDF is used to distinguish UDFs which use the standard C linkage conventions. C, C++, and Delphi compilers can create UDF libraries (DLLs or Unix shared libraries) which use the C calling convention.
Native UDFs are deployed natively as shared UDF libraries (.dll or .so files), loaded dynamically as needed. Java UDFs are not deployed as a native shared library. Since native UDFs are encoded natively, native InterBase type representations are passed directly to/from native UDFs.
For example, a language such as Delphi can represent the C data type ISC_QUAD* directly using the Delphi syntax
type
ISC_QUAD = record
isc_quad_high : Integer ;
isc_quad_low : Cardinal ;
end;
PISC_QUAD = ^ISC_QUAD;
Such a pointer may be passed on the stack to the Delphi UDF in exactly the same way as it would be passed to a C UDF. So executing native UDFs for any compiled language supporting C data type representations and C calling conventions requires no additional support within the InterBase engine.
Java, on the other hand, is not compiled, and with no pointers or structures, has no language support for C data type representations, and does not support native C calling conventions. So executing Java UDFs requires communication with a Java Virtual Machine (JVM), and Java objects will need to be created within the InterBase engine in a portable way, independently of the object representation used by any particular JVM implementation.
The Java Native Interface (JNI) provides a mechanism for creating and manipulating portable Java objects as internal C structures and pointers. So for example, the representation for an 8-byte C date structure (ISC_QUAD*) can be converted by the engine to a java.sql.Date object before being passed to a Java UDF method. The java.sql.Date object representation would be created within the engine in a portable way via the JNI, but the actual underlying C representation for the object would be peculiar to whatever JVM implementation happens to be embedded. Similarly, the representation for an InterBase C string (char*) can be converted by the engine to a Java Unicode java.lang.String object before being passed to a Java UDF method.
Note
Beyond the lack of language support for C data type representation, in fact, the Java Language Specification does not even define how objects are laid out in memory. Java does not expose the structure of objects. So while a string in C has a known format as a sequence of character bytes in memory, this is not the case for a java.lang.String object. The JNI API provides a means to create and access Java objects in C without exposing the internal memory layout (structures) of Java objects which can vary from one VM port to another.
So there are three key requirements for supporting Java UDFs from within InterBase
- Provide support for describing Java UDF structures in so far as Java UDF descriptions differ from native UDF descriptions.
- Provide support for converting InterBase native datatypes to/from JNI representations within the engine before passing/returning such representations to/from a Java UDF.
- Provide a means to embed and configure a Java Virtual Machine (JVM) for interpreting Java UDFs during SQL execution.
The functional implications of these requirements are detailed in section User Interface/Usability, but may be summarized briefly as follows
- A DDL syntax to distinguish Java UDFs from native UDFs. Java UDF invocations (DML) will be syntactically identically with native UDF invocations, but must be distinguishable semantically. Furthermore, Java UDF descriptions are different from native UDF descriptions. For example, unlike native UDFs, Java UDFs are not known by a MODULE_NAME and ENTRY_POINT. Instead, Java UDF DDL must describe a Java UDF's class and method name. In addition, Java UDF DDL will use Java UDF datatype names, and will not employ descriptors such as FREE_IT, and BY VALUE, which are only meaningful in the context of native UDFs and pointer-based memory management.
- A system table for describing Java UDFs and associated metadata extract.
- A set of Java UDF datatypes and associated Java classes.
- An extended mechanism for InterBase error handling in support of recoverable Java exceptions.
- JVM configuration options, including the specification of a Java class path.
- An installation which deploys a Java Runtime Environment (JRE).
This specification assumes an understanding of native UDFs, and does not attempt to describe the meaning of UDFs in general, except where the behavior of Java UDFs differs from that of native UDFs.
Also note that Sun changed the name of Java 1.2 to Java 2, and JDK 1.2 to Java 2 SDK. These terms are now used synonymously.
Description
Support for Java UDFs (User Defined Functions) will allow for an external library of Java classes and methods to be utilized anywhere that a SQL function may be used. This provides for runtime SQL execution to perform data manipulation tasks by communicating directly with a Java Virtual Machine (JVM) local to the InterBase server. Through the use of the Java Native Interface (JNI) we can embed and use a Java VM in a standardized way that will work with any VM implementation supporting the JNI. All future evolutions of the JNI will maintain complete binary compatibility.
The recursive evaluation (execution) of SQL containing a Java UDF invocation will perform all necessary conversions of the UDF arguments and UDF return values from the InterBase native datatype structures to the corresponding data and character representations used by Java. For example, InterBase DATEs, TIMEs, and TIMESTAMPs will be converted to Java objects of class java.sql.Date, java.sql.Time, and java.sql.Timestamp respectively. Blob UDFs will be supported through the use of a customized public Java Blob library which invokes callbacks to the InterBase engine for performing gets and puts on Blob segments.
User Interface/Usability
When defining a Java UDF, there are two declarations to consider. The first declaration is for the actual Java Method in some external Java class library. The second declaration is for the UDF itself as declared to a database. Although the two declarations must correspond by the typing of their arguments and return value, they are nonetheless distinct declarations, and they will be referred to as the Java Method Declaration and the Java UDF Declaration.
For native UDF declarations, UDF type names, rather than C type names, are used to denote the types of UDF arguments and return values. For example,
DECLARE EXTERNAL FUNCTION foo
BLOB,
// C type is a blob structure pointer *Blob
CSTRING(n),
// C type is *char, InterBase type is CHAR or VARCHAR
NUMERIC(n),
// C type is ISC_QUAD*
int or short,
//InterBase type is INT64, INTEGER, or SMALLINT.
RETURNS TIMESTAMP
// C type is ISC_QUAD* from the ibase.h file
ENTRY_POINT "C-function-name"
MODULE_NAME "udflib.dll";
Similarly for Java UDFs, an extended class of UDF type names are used to denote the types of UDF arguments and UDF return values rather than the actual Java type names used in the Java Method Declaration. For example,
DECLARE EXTERNAL JAVA FUNCTION foo
BLOB,
// Java type is class com.borland.interbase.Blob
JSTRING(n),
// Java type is java.lang.String, InterBase type is CHAR or VARCHAR
NUMERIC(n),
// Java type is java.math.BigInteger
RETURNS TIMESTAMP
// Java type is java.sql.Timestamp
CLASS "class-name"
METHOD "method-name";
Although Java type names are not used in the Java UDF declaration, each UDF datatype corresponds strictly with a Java type or class.
Note
JSTRING is a new UDF type name to be introduced in support of Java UDFs. The example above is meant to be introductory and illustrative only, the meaning of the Java UDF declaration syntax will be described later.
When a Java UDF is invoked by InterBase, arguments are provided whose engine native types must be converted to the corresponding Java types. Because of the necessary datatype conversions from InterBase native structures to Java representations, Java UDF invocations must be distinguishable from native UDF invocations. Given a Java UDF declaration, it must also be possible [for the user] to infer the Java types of the corresponding method arguments and method return value. Therefore the SQL syntax for declaring a UDF must provide a means to indicate that the UDF is a Java UDF, as well as provide a means to indicate the Java UDF types of the UDF arguments and UDF return value.
Java UDF Datatypes
The correspondence between the datatyping of Java UDFs and their corresponding Java Methods is as follows:
Java UDF Declared Type | Java Method Declared Type | Description |
---|---|---|
JSTRING | java.lang.String | The UDF type JSTRING indicates that the Java UDF expects an object of class java.lang.String to be passed or returned. This is analagous to CSTRING for native UDFs. Except that native UDFs perform no implicit character conversions, and no character encoding is enforced on the passed C strings (C strings are passed byte-for-byte). InterBase CHAR or VARCHAR fields of any character set may be passed to Java UDFs by an implicit conversion to a Java Unicode String. Any native InterBase character set which is convertable to and from intermediary Unicode FSS by the InterBase engine will be supported. Design note: The conversion from the intermediary Unicode FSS to a 2-byte Unicode representation is handled by the Java UDF implementation. The user will not be aware of the intermediary Unicode FSS representation. |
TIMESTAMP, TIME, or DATE | java.sql.Timestamp, java.sql.Time, or java.sql.Date | Dates and Times may be passed to and from Java UDFs by converting InterBase dates and times to and from Java dates and times as described by the JDBC java.sql interfaces. Design note: This conversion code currently exists in InterClient, and may be reused. |
BLOB | com.borland.interbase.Blob | The UDF type BLOB indicates that the Java UDF method expects an object of class com.borland.interbase.Blob to be passed or returned. A Blob class is introduced to encapsulate an InterBase Blob and provide the necessary methods for getting and putting segments to the Blob. |
NUMERIC(p,s) or DECIMAL(p,s) | java.math.BigInteger | Exact numerics could be passed to and from Java UDFs by converting the InterBase INT64, INTEGER, and SMALLINT representations to and from Java longs, ints, and shorts respectively, depending on precision p. But, as with native UDFs, this would not provide scale, and the user is therefore burdened with having to know the scale of the field ahead of time, and make any required scale adjustments within the UDF. Using java.math.BigInteger is a better choice as this provides the exact value of the numeric pre-adjusted for the scale. Note: As with native UDFs, precisions greater than 18 are not supported, but could be accomodated by java.math.BigInteger in some future release of InterBase. java.math.BigInteger is the standard JDBC class used for large numerics. |
DOUBLE PRECISION | double | The Java UDF method expects a Java double to be passed or returned. |
INTEGER | int | The Java UDF method expects a Java int to be passed or returned. |
SMALLINT | short | The Java UDF method expects a Java short to be passed or returned. |
So, for example, a Java UDF may be declared to the database using a type name of BLOB, but the actual Java Method invoked by the UDF must be declared using class com.borland.interbase.Blob.
Java UDF Declaration Syntax and Semantics
The Java UDF declaration syntax (DDL) must be supported by DSQL, ISQL (which, as a design note, happens to be built on top of DSQL), and GPRE. The Java UDF invocation syntax (DML) will be identical with the native UDF invocation syntax. Which Java method is actually executed as a result of a Java UDF invocation depends on three settings of class name, method name, and classpath.
The proposed syntax for declaring Java UDFs will follow. Please see the section Syntax Conventions for a description of the extended BNF notation used below. The LALR(1) syntax for Java UDFs is deferred as a design consideration.
DECLARE EXTERNAL JAVA FUNCTION udf-name
[ java-udf-datatype .,..]
[ RETURNS { java-udf-datatype
| PARAMETER argument-position } ]
CLASS "class-name"
METHOD "method-name";
java-udf-datatype ::=
JSTRING (maximum-character-length)
| NUMERIC(p,s) | NUMERIC(p) | DECIMAL(p,s) | DECIMAL(p)
| DATE | TIME | TIMESTAMP
| BLOB
| DOUBLE PRECISION
| INTEGER
| SMALLINT
The semantics of java-udf-datatype have already been described under Java UDF Datatypes above. For native UDFs, type CSTRING requires a maximum-byte-length qualifier. For Java UDFs, type JSTRING requires a maximum-character-length qualifier. A brief semantics of the other syntactic components follows.
- udf-name is the string token representing the invocable UDF name. This is the name that is used when invoking the UDF in an SQL expression, and is not necessarily the same as the method-name.
- RETURNS PARAMETER argument-position is used to indicate that the return value is stored in the parameter identified by argument-position.
- CLASS "class-name" is used to indicate the class name containing the Java method for the defined function.
- METHOD "method-name" is used to indicate the static Java method name for the defined function.
Note
The number of parameters to a native UDF is limited to 10. There is no such limit to the number of parameters to a Java UDF.
Note
For simplicity, java-udf-datatype is used for both input-parameter datatypes and return-parameter datatypes. However, certain limitations are imposed on the syntactic rules. In particular, for both native UDFs and Java UDFs, a BLOB may not be used as a return-parameter datatype, instead a RETURNS PARAMETER n clause must be used.
Note
Memory management of returned values from Java UDFs does not need to be explicitly controlled by the user as with native UDFs via BY VALUE and FREE IT. Internally, Java objects created within the engine will have their references destroyed after use. So if a user's Java UDF code maintains no references to the returned data, that data is eligible for garbage collection by the VM.
Note
The setting of a classpath will be a JVM configuration, and not a Java UDF setting.
com.borland.interbase.Blob
The user interface, or class signature, provided in support of type BLOB is as follows:
package com.borland.interbase;
/**
* This class represents a Blob as passed to a Java UDF.
* A Blob UDF cannot open or close a Blob,
* but instead invokes Blob methods to perform Blob access.
* A UDF that returns a Blob does not actually define a return value.
* Instead, the return-Blob must be passed as the last
* input parameter to the UDF.
**/
public class Blob
{
/**
* Read a Blob segment into a buffer, and return the number
* of bytes read.
**/
public int getSegment (byte[] buffer)
/**
* Write a Blob segment of bytesToPut bytes from a buffer.
**/
public void putSegment (byte[] buffer, int bytesToPut);
/**
* Returns the total number of segments in the Blob.
**/
public long numberOfSegments ();
/**
* The size, in bytes, of the largest single segment in the Blob.
**/
public int maxSegmentLength ();
/**
* Returns the actual total size, in bytes, of the Blob.
**/
public long size ();
}
JVM Configuration
A JVM may be shared by the InterBase server and all its connections (users). The JVM is thread-safe and therefore may be shared by concurrent query threads. The JVM must be configured when the JVM is initialized, so the JVM may only be configured once after the InterBase server is started, and the configuration must be at the server level. If the JVM is to be reconfigured, the InterBase server must be shutdown and restarted.
Let's consider the functional requirements for a configurable JVM.
Functional Requirements For A Server-Wide Configurable JVM
First off, we'll need to have a way to configure the server to enable Java UDF support. This could be an option in the ibconfig file such as:
LOAD_JAVA_VIRTUAL_MACHINE TRUE
or it could be a system environment variable of the same name. The default for LOAD_JAVA_VIRTUAL_MACHINE must be FALSE since InterBase installations in general will not deploy Java UDFs.
When the JVM is initialized, the classpath for all user-defined Java classes must be supplied. The classpath indicates the location of all Java class libraries for the Java UDFs and must be local to the InterBase server. By default, the classpath for all user-defined Java functions could be:
<interbase-dir>/java_udfs
where <interbase-dir> is the InterBase installation directory. The default classpath of <interbase-dir>/java_udfs could be modified manually by setting a startup configuration parameter such as JAVA_UDF_CLASSPATH. Like LOAD_JAVA_VIRTUAL_MACHINE, JAVA_UDF_CLASSPATH could also be a server-side system environment variable set before the InterBase server starts.
All directories and jar files in the classpath setting are separated by semi-colons according to the standard Java conventions for setting classpath on Windows. Here is an example setting:
JAVA_UDF_CLASSPATH c:interbasejava_udfs;d:fredsUdfsmathUdfs.jar
For native UDFs, a library module (.dll file) is specified along with the UDF declaration. However, specifying a java archive (.jar file) along with a Java UDF declaration is not possible because all Java class libraries must be known in advance of initializing the JVM. Therefore, Java archive files (.jar files) and Java class library directory locations containing Java UDFs should be appended to the JAVA_UDF_CLASSPATH variable in the ibconfig file. See section Requirements and Constraints for more details.
For Java UDFs which utilize native libraries via JNI, the directory location of the native libraries (.dll files) must also be provided when the JVM is initialized. By default, it is assumed that Java UDFs are written in pure Java. However, if Java UDFs are utilized which call into native libraries, these libraries must be specified in the JAVA_UDF_NATIVE_LIBRARY_PATH variable of the ibconfig startup file. All directories in the path setting are separated by semi-colons according to the standard Java conventions for setting path on Windows:
JAVA_UDF_NATIVE_LIBRARY_PATH d:fredsUdfNativeLibs
There is a secondary option of when to create the JVM. The JVM could be created when the InterBase server starts up (accepted), or alternatively, it could be created upon invocation of the first Java UDF (rejected). Which choice is taken would affect the design under JDK 1.1 because of threading issues. So an ancillary design issue is addressed here.
Design note: In JDK 1.1, the main thread which created the JVM must be maintained for the life of the embedding application, and only this main thread may destroy the JVM (thereby releasing JVM resources). Therefore, in JDK 1.1, a transient query thread cannot be used to create the JVM, as would be tempting to do if the JVM is created on the first invocation of a Java UDF during SQL execution. If the JVM has not yet been created, then the first transient query thread to invoke a Java UDF must yield to the dedicated main thread to create the JVM. This main thread must also destroy the JVM at server shutdown time. If the JVM is already started, any transient query thread may "attach" itself to the JVM before invoking it, and "detach" itself from the JVM before being returned to the internal pool of InterBase query threads. These design requirements have changed in the JDK 1.2 version of the JNI, in which any thread may destroy the JVM.
The idea of loading the JVM upon Java UDF invocation, rather than server startup has been rejected. Here's a qoute from Mark Duquette which best explains why:
If we opt for a single JVM implementation, then I would vote for loading the JVM on server startup. It is usually acceptable for an application to take a little time to load as opposed to an element being accessed (e.g. performing a select after a large delete took forever because of garbage collection).
Functional Requirements For Multiple Connection-Wide JVMs (Rejected Alternative)
This alternative is academic, being that it is actually not possible given the current JVM implementations, and would probably not be a desirable alternative even if it were possible, but it is included here for completeness.
Alternatively to a single server-wide JVM, separate JVMs could be created for each connection which requests a JVM. This gives control over the configuration of the JVM to the user connection and does not require server restart for a new JVM configuration to accomodate some new connection.
The JNI provides a mechanism for creating multiple JVMs to facilitate thread isolation in multi-threaded programming environments. One simple way to allocate JVMs is to create a dedicated JVM for each connection which needs Java UDF support. In this case, a JVM may be created and configured when a connection which requests Java UDF support is established to a database. A connection requesting a JVM may specify a server-side classpath, as well as a server-side native library path if necessary. Other ways of distributing multiple JVMs are possible, such as one JVM per query (way too costly), but one JVM per requesting connection is probably the most logical if one opted for multiple JVMs.
Multiple connection-wide JVMs could be configured in the same way as a single server-wide JVM is configured via the ibconfig file. However, this does not allow for differing JVM configurations between connections. One way to allow for connection-level JVM configurations is through the use of Database Parameter Block (DPB) options. The DPB parameters would be analagous to the ibconfig parameters in the server-wide JVM scenario:
isc_dpb_load_java_virtual_machine isc_dpb_java_udf_classpath isc_dpb_java_udf_native_library_path
SQL support would also need to be surfaced by extending the syntax of the CONNECT statement. For example:
CONNECT "employee.gdb" LOAD_JAVA_VIRTUAL_MACHINE JAVA_UDF_CLASSPATH "d:java_udfs";
Alternatively, we could eliminate the need for isc_dpb_load_java_virtual_machine and LOAD_JAVA_VIRTUAL_MACHINE by first having connection requests check the system tables for any Java UDF entry in the database. Then, create a JVM if and only if a Java UDF is found in the system tables. The overhead in checking for Java UDFs in the system tables may be minimized by using an in-memory DBB flag to indicate existence of Java UDFs in the database. This flag would be set only at the first attachment of a client to the database, and would be used by subsequent attachments.
Because each JVM maintains its own object memory, using multiple JVMs would present some difficulties if static class variables were modified by a UDF. Because of this and the amount of resources that would be required by multiple JVMs, a single server-wide JVM is undoubtedly our best option under the super-server model. In fact, I asked a JavaSoft JNI engineer the following question to get an idea of the intended usage of multiple JVMs:
Question: Regarding JNI_CreateJavaVM(). Since the JVM is multi-threaded, under what circumstances would an application ever want more than one instance of the JVM?
Answer: For isolation (say different System.out's). The API for multiple VMs was added but never implemented and is not clear whether it will ever be.
Here's a further comment giving another reason for rejecting this design alternative:
IMHO, a single JVM which can be shared by concurrent query threads is the best option. Not only would this be less resource intensive, but also would allow a single point of contact for our InterBase server. We would not have to add complexity of managing multiple JVM's invoked by each query thread. Also, this enhances the speed of query execution by avoiding a JVM start for each query.
A System Table For Java UDFs
Rather than introduce a new system table for Java UDFs (eg. RDB$JAVA_FUNCTIONS), the existing system table for native UDFs (RDB$FUNCTIONS) will be extended to accomodate Java UDFs. Reusing the RDB$FUNCTIONS system table will have the least affect on existing middleware and application products, and will help to ensure that function names are unique. Here's the proposed new schema for RDB$FUNCTIONS.
Column Name | Datatype | Length | Description |
---|---|---|---|
RDB$FUNCTION_NAME | CHAR | 31 | Unique name for a native function or Java UDF. |
RDB$FUNCTION_TYPE | SMALLINT | Prior to V7 this was reserved for future use. For V7 this field is used to indicate whether the function is a native UDF or a Java UDF. 0 indicates native, 1 indicates Java. | |
RDB$QUERY_NAME | CHAR | 31 | Alternative name for the function that can be used in ISQL. |
RDB$DESCRIPTION | BLOB | 80 | Subtype Text: Contains a user-written description of the function being defined. |
RDB$MODULE_NAME | VARCHAR | 253 | For native UDFs, this names the function library where the executable function is stored. For Java UDFs, this field is NULL. This field is nullable. |
RDB$ENTRYPOINT | CHAR | 31 | For native UDFs, this is the entry point within the function library for the function being defined. For Java UDFs, this field is NULL. This field is nullable. |
RDB$RETURN_ARGUMENT | SMALLINT | Position of the argument returned to the calling program; this position is specified in relation to other arguments. | |
RDB$SYSTEM_FLAG | SMALLINT | Indicates whether the function is user-defined (value of 0) or system-defined (value of 1). | |
RDB$CLASS_NAME | VARCHAR | 253 | For Java UDFs, this is the Java class name containing the Java method for the defined function. For native UDFs, this is NULL. |
RDB$METHOD_NAME | VARCHAR | 253 | For Java UDFs, this is the Java method name for the defined function. For native UDFs, this is NULL. |
Table RDB$FUNCTION_ARGUMENTS should not need to be modified as this describes the InterBase types of UDF arguments.
Exception Handling
Testing has shown that the Java VM will crash with a segmentation violation upon UDF invocation if the Java UDF Declaration and the Java Method Declaration signatures do not match. Exceptions occuring from within the JVM will be trapped by the engine, then an appropriate error message will be logged, and the server will exit gracefully. Unlike native UDFs, Java UDF exceptions include both abnormal terminations of the Java VM, and normal Java exceptions thrown from within the UDF Java method itself. So the engine will trap both normal Java exceptions thrown from a Java method, as well as abnormal terminations of the Java VM. Furthermore, the server should not exit for a Java exception, as it does for an abnormal termination such as a segmentation violation. Rather, the server should log a message for the Java exception (by way of the status vector) and abort the associated query, but not exit.
Design note: The implementation could leverage the work done for UDF exception handling in which the server does not terminate. This is not currently in force for 6.0 since it's unsafe to continue the server after a segmentation violation.
Deploying The Java Runtime
In order for end users to use Java UDFs, they'll need to have a Java runtime environment installed on their server. The Java 2 SDK software can serve as a runtime environment. However, we shouldn't assume all users have the Java 2 SDK software installed, and the Java 2 SDK software license doesn't allow us to redistribute SDK software files.
To solve this problem, Sun provides the Java 2 runtime environment as a free, redistributable runtime environment, available for Win32 and Solaris systems. By distributing the JRE with InterBase, we can ensure that customers will have the correct version of the Java platform for running our software.
The Java Runtime Environment (JRE) is the minimum standard Java platform for running applications written in the Java programming language. It contains the Java virtual machine, Java core classes, and supporting files. The JRE does not contain any of the development tools (such as appletviewer or javac) or classes that pertain only to a development environment.
The Win32 version comes with a built-in installation program suitable for end-users. Solaris versions require the developer to provide installation support. This means the InterBase install could invoke the Sun JRE installation exe for Win32 if desired, but must install the JRE files manually on Solaris.
The Java 2 runtime environment for Win32 is available both with and without international support. The non-international version is much smaller, but is suitable only for English-speaking users.
We also must make sure that our installation procedure never overwrites an existing JRE installation, unless the existing runtime environment is an older version.
The Win32 installation program records program information in the Windows Registry. This registry information includes the software version, which we will need to compare with the Java 2 runtime environment version compatible with our InterBase software.
One approach is to install the Java 2 runtime environment files manually into our own InterBase directory or any other directory specified by the installer. If we choose this approach, we must redistribute the JRE in its entirety except for some optional files which we may choose not to redistribute. The files that are optional are listed in the JRE README. They are mostly for functionality such as internationalization and localization which we may or may not need. The Java 2 Runtime Environment software can only be redistributed if all required files are included. Arbitrary subsetting of the Java 2 runtime environment is not allowed. See the JRE LICENSE file for specifics. We will also have to include license provisions in our InterBase license file.
The Java 2 runtime environment includes bin and lib subdirectories which both must reside in the same parent JRE directory. The bin directory contains about two dozen dlls and exes. The lib directory contains various jars and associated files. These files are too numerous to enumerate here.
In the case of the Win32 Java 2 runtime environment, the native C runtime library, msvcrt.dll, should be copied to the Windows system directory. The location of this directory varies on different operating systems, but is usually
- winntsystem32 on Windows NT
- windows98system on Windows 98
- windowssystem on Windows 95
Although InterBase already distributes this file, it is stated here for the record that this file should be included in redistributions of the Win32 version of the Java 2 runtime environment.
MetaData Extract Utility For Java UDFs
For each Java UDF declared to the database, extract out
Standard Java UDF Library
A custom Java UDF library comparable to our FreeUDF library, or Gregory Deatz' or MER System's native UDF libraries is not really necessary because most of these functions already exist in the core Java class libraries. One of the advantages of providing support for Java UDFs is that the core Java class libraries already provide a wealth of built-in methods ready for use. This also means it is unnecessary to port the standard InterBase native UDF library. However, a standard set of Blob UDFs, especially for converting String data to and from Blobs, would be a useful add-on Java UDF library.
Linking With Unknown Java Virtual Machines
The ability to embed a JVM from within a native application such as InterBase requires us to link with a Java virtual machine implementation. How we link with a Java virtual machine depends on whether we intend to deploy with only one particular virtual machine implementation or a variety of virtual machine implementations from different vendors. Because the JNI does not specify the name of the native library that implements a Java virtual machine, we should be prepared to work with Java virtual machine implementations that are shipped under different names. In general, different vendors may name their virtual machine implementations differently. For example, on Win32, Sun's virtual machine is shipped as javai.dll in the JDK release 1.1, and as jvm.dll in the Java 2 SDK, and Microsoft's virtual machine will go by yet some other name.
The solution is to use programmatic run-time dynamic linking to load the particular virtual machine library specified in the ibconfig variable JAVA_VIRTUAL_MACHINE_LIBRARY. This variable could hold just the name of the library, such as "jvm", or the absolute path to the library, such as "c:jdk1.2jrebinclassicjvm.dll", but it is preferrable that the absolute path is used since relying on LoadLibrary() or dlopen() to search for jvm.dll makes InterBase susceptible to configuration changes, such as additions to the PATH environment variable:
JAVA_VIRTUAL_MACHINE_LIBRARY "c:\jdk1.2\jre\bin\classic\jvm.dll"
Design Note: Linking in this way, we would not need to make explicit JNI function calls from within the InterBase engine code, and we would therefore not need to link the engine with jvm.lib for entry point information. Rather, JNI function calls would be found by their function name address in the dynamically loaded JVM library. For example, the following Win32 code finds the function entry point for AttachCurrentThread given a virtual machine library:
// Return a function pointer to the JNI function
// "AttachCurrentThread" in a variable JVM library.
void *findAttachCurrentThread (char *jvmLibrary)
{
HINSTANCE hVM = LoadLibrary (jvmLibrary);
if (hVM == NULL) return NULL;
return GetProcAddress (hVM, "AttachCurrentThread");
}
The Solaris version is:
// Return a function pointer to the JNI function
// "AttachCurrentThread" in a variable JVM library.
void *findAttachCurrentThread (char* jvmLibrary)
{
void *libVM = dlopen (jvmLibrary, RTLD_LAZY);
if (libVM == NULL) return NULL;
return dlsym (libVM, "AttachCurrentThread");
}
Requirements and Constraints
The Java analog to the native UDF module name (.dll file) is the Java classpath. The Java classpath must be known in advance at the time the JVM is initialized. The Java classpath is not specified at UDF declaration time. Rather, the Java classpath is specified at server startup time. This has the disadvantage that the classpath is fixed for the life of the server, but has the advantage that if Java libraries are renamed, the Java classpath can be reconfigured without having to redeclare the UDF to the database. Whereas with native UDFs, the .dll module name is hardwired in the system tables for the UDF. So if a .dll module name changes, the native UDF must be redeclared to the database.
Like native UDFs, the invocation of a Java UDF will release the engine thread lock by performing a thread-exit before transferring control to the Java runtime (invoking the UDF). When the Java UDF returns, a thread-enter will be performed to regain the thread lock on the engine.
The JVM port must provide native Java thread support for the deployed platform. Therefore initial support for Java UDFs will be for Win32 only. Here's a quote from Sun's JNI FAQ (http://java.sun.com/products/jdk/faq/jnifaq.html ):
"The Solaris Java VM shipped with JDK 1.1 is not suitable for embedding into certain native applications. Because it depends on a user-level thread package to implement Java threads, the VM overrides a number of system calls in order to support non-blocking I/O. This may have undesirable affects on the hosting native application. In addition, the Invocation API function AttachCurrentThread is not supported on Solaris.
We plan to fix these problems in the near future by releasing a Java VM directly supported by Solaris native threads."
This non-native Java thread implementation was known as green threads.
But further information is now to be found at the new Java 2 JNI FAQ ( http://java.sun.com/products/jdk/faq/jni-j2sdk-faq.html#nativethreads ):
"8. (Solaris) Has support for native threads gotten any better? Yes. As of JDK/JRE 1.1.3 you could download a Solaris Native Threads Pack which was fully supported. In the Java 2 SDK, native threads is integrated into the release. If you use the invocation API on Solaris to embed the JVM into your application, we recommend the use of the native threads VM (see also Q11 on linker issues on Solaris).
Lest you ask, we have always used only native threads on the Win32 platforms."
This will need to be tested directly on Solaris for confirmation. Note that we must embed the native threads VM since green threads and native threads don't mix, and of course InterBase already links with -lthread -lc which is required for access to Solaris native threads.
Because of significant differences in the JNI API between Java 1 and Java 2, only Java 2 and above will be supported.
The JVM port must support the Java 2 JNI interfaces.
The blr and dyn generation for Java UDFs is deferred as a design consideration.
Thread-safety of Java UDFs is up to the author of the Java UDF class library. However, this is a relatively easy task in Java.
Performance of Java UDFs will be inferior to that of native UDFs because of the necessary internal conversions from native InterBase datatypes to Java objects.
Migration Issues
Although RDB$ENTRYPOINT and RDB$MODULE_NAME are nullable fields, the modified system table for RDB$FUNCTIONS could affect applications which do not allow for a null RDB$ENTRYPOINT or RDB$MODULE_NAME.
Open Issues
- Any way to force garbage collection of JVM by InterBase?
- Server-wide UDFs - this is a desirable feature for both native and java UDFs.
Syntax Conventions
The syntax diagram conventions mostly follow BNF, with a few variations to enhance readability. Here is a description of the general rules for specifying syntax in this extended BNF. Please be aware that BNF is a high-level specification syntax, and is not a low-level LALR(1) syntax as used by parser generators such as YACC. The LALR(1) syntax for Java UDFs is deferred as a design consideration. These rules are taken from the book "SQL Instant Reference" by Martin Gruber, Sybex Publishing.
- The symbol ::=means "is defined as". It is used to further clarify parts of a statement's syntax diagram.
- Keywords appear in all uppercase letters. These reserved words are literals that are actually written as part of the statement.
- Place holders for specific values, such as domain-name in the CREATE DOMAIN domain-name statement, appear in italic type. These place holders identify the type of value that should be used in a real statement; they are not literals to be written as part of the statement. This is not a standard BNF convention.
- Optional portions of a statement appear in square brackets ([ and ]).
- A vertical bar (|) indicates that whatever precedes it may optionally be replaced by whatever follows it.
- Braces ({ and }) indicate that everything within them is to be regarded as a whole for the purpose of grouping.
- Ellipses (...) indicate that the preceding portion of the statement may be repeated any number of times.
- Ellipses with an interposed comma (.,..) indicates that the preceding portion may be repeated any number of times, with the individual occurrences separated by commas. The final occurence should not be followed by a comma. Note: This is not a standard BNF convention; it is used for clarity and simplicity.
- Parenthesis () used in syntax diagrams are literals. They indicate that parenthesis are to be used in forming the statement. They do not specify a way of grouping the diagram as braces or square brackets do.
Reference Documents
- http://java.sun.com/products/jdk/faq/jnifaq.html
- http://java.sun.com/products/jdk/1.2/docs/guide/jni/spec/jniTOC.doc.html
- http://java.sun.com/products/jdk/1.2/docs/guide/jni/jni-12.html
- http://java.sun.com/products/jdk/1.2/jre/
- http://java.sun.com/products/jdk/1.2/jre/LICENSE
- http://java.sun.com/products/jdk/1.2/jre/README
- http://java.sun.com/products/jdk/1.2/runtime.html
- http://java.sun.com/products/jdk/faq/jni-j2sdk-faq.html
- Solaris Native Threads Pack
- The Java Native Interface: Programmer's Guide and Specification
- InterBase Data Definition Guide, Programmer's Guide, and Language Reference
- JNI Specification