Java UDF Design Notes
Note
Whilst InterBase V6.0 was in beta, work was started on the implementation of a design for supporting Java UDF's in the next release of InterBase. This document contain the pertinent information related to this work.
Build Script Modifications
Propose a temporary build flag JAVA_UDFS to enable Java UDF functionality in the build until such time as the functionality has stabilized.
Examples:
build_lib -DJAVA_UDFS -DDEV > build.log 2>&1 cddsql make -DDEV -DJAVA_UDFS -DCLIENT -fmakefile.lib dsql.rsp make -DDEV -DJAVA_UDFS -fmakefile.lib dsql.rsp cdjrd make -DDEV -DJAVA_UDFS -fmakefile.lib eng32.dll (modified rule target name) make -DDEV -DJAVA_UDFS -DCLIENT -fmakefile.lib gds32.dll cdremote make -DDEV -DJAVA_UDFS -fmakefile.lib nt_server.exe cdjrd make -DDEV -DJAVA_UDFS -fmakefile.lib install_server make -DDEV -DJAVA_UDFS -DCLIENT -fmakefile.lib install_client
For prototype purposes, I am currently linking Sun's jvm.lib (the library file for jvm.dll) into eng32.dll and nt_server.exe. This allows jvm.dll to be loaded as needed at runtime, and allows for explicit calls to JNI functions as needed in the engine code. jvm.lib does not need to be linked to the client library gds32.dll.
The prototype currently adds C++ (.cpp) files in jrd, and adds C++ object targets to the jrd makefile (GDS_CPP_FILES and GDS_CPP_OBJS vs. GDS_FILES and GDS_OBJS) to accomodate a make rule for .cpp files. extern C {...} is used in the C++ header files so that jrd functions can call into the C++ code using C calling conventions. Modifications to existing jrd files still use C code.
The jrd compile requires a couple header files from the Java 2 SDK: jni.h and jni_md.h. These files and jvm.lib, being necessary for a build, are being put into a separate directory (component) in the InterBase development tree named "java". We may decide to move the new files in jrd to the java component as well.
JVM Configuration
The following ibconfig configuration variables are proposed:
- JAVA_LOAD_VIRTUAL_MACHINE [ TRUE | FALSE ]
- JAVA_VIRTUAL_MACHINE_LIBRARY <absolute path to JVM shared library, eg. "c:jdk1.2jrebinclassicjvm.dll">
- JAVA_UDF_CLASSPATH <"path conventions as dictated by the VM">
- JAVA_UDF_NATIVE_LIBRARY_PATH <"path conventions as dictated by the VM">
- JAVA_JIT_COMPILER <"path conventions as dictated by the VM">
If JAVA_LOAD_VIRTUAL_MACHINE is FALSE then all the other variables are ignored.
The current prototype works as follows: ISCCFG_*_KEY defines (cfgtbl table indices) have been added for each variable to the gds.h and ibase.h files, as well as some corresponding ISCCFG_* values in isc.h for specifying the ibconfig variable names and defaults in isc.c:ISC_def_cfg_tbl. The control logic in isc.c for setting configuration values is extended to conditionally create and initialize a JVM with the given configuration settings, and remote/window.c is modified to pass an extended WND_hdrtbl to accomodate the new table entries.
Question: What changes are necessary to accomodate the service API?
Once initialized, the JVM is thread-safe and may be used globally throughout InterBase code regardless of thread context. However, JNI functions are always invoked in the context of a JNIEnv structure. The JNIEnv contains some internal virtual machine data structures, as well as a function table of all the JNI functions. So the JNIEnv maintains an array of pointers to JNI functions, and every JNI function invocation is actually a method invocation on a JNIEnv object. A JNIEnv object is only valid in the thread associated with it. So one cannot pass a JNIEnv pointer from one thread to another, or cache it and use it multiple threads. Any local references to objects created by the JVM via the JNI, such as those passed to and fro a Java UDF, are valid only in the thread that created them. Unlike the JNIEnv pointer, the JavaVM pointer remains valid across multiple threads so it can be cached in a global variable.
Therefore, each Java UDF thread obtains a JNIEnv instance by invoking the AttachCurrentThread() method on a global JavaVM instance. Java UDF invocations do not operate in the thread context of the InterBase query thread which invokes the UDF. After use, each Java UDF thread will also free any local object references made within the context of the thread, and release any monitors held by the thread by using DetachCurrentThread().
It should be noted, that the JVM we embed in InterBase must support the native thread model. Virtual machine implementations that employ a user thread package will not work properly when embedded in InterBase.
A JVM is declared globally:
static JavaVM *jvm;
// denotes a Java VM, unlike the JNIEnv, the jvm is thread-safe
Generic code to create and intialize a JVM at server startup follows:
{
JNIEnv *env;
// pointer to native method interface, only
//valid for current thread, can't be static
SLONG success = 0;
// The following JNI code is compatible with
// JDK 2 and above only.
// This program must be linked with file jvm.lib.
// This program must run with jvm.dll in your PATH.
// Of course, this must be compiled with _MT and/or /MDd
JavaVMInitArgs vm_args; // JDK 2 VM initialization arguments
JavaVMOption options[3];
options[0].optionString = "-Djava.compiler=<jit compiler>";
// or NONE to disable JIT
//
options[1].optionString = "-Djava.class.path=<class path>";
// user classes, location of Java UDF class library
//
options[2].optionString = "-verbose:jni";
// print JNI-related messages
//
// The following is for the user's native libraries, not
// the location of java libraries
options[3].optionString = "-Djava.library.path=<native library path>";
// set native library path
vm_args.version = JNI_VERSION_1_2;
// Must be 1.2
//
vm_args.options = options;
// pointer to VM options[] array
//
vm_args.nOptions = 4;
// size of the options[] array
//
vm_args.ignoreUnrecognized = JNI_TRUE;
// CreateJavaVM() will ignore unrecognized options if True
//
// Load and initialize a Java VM, return a JNI interface pointer in env.
// Note that in JDK 1.2, there is no longer any need to call
// JNI_GetDefaultJavaVMInitArgs().
if (JNI_CreateJavaVM (&jvm, (void **) &env, &vm_args) < 0)
success = 0;
else {
success = 1;
}
}
At server shutdown time, in JDK 1.2 any thread may unload the virtual machine and reclaim resources by calling DestroyJavaVM(). This call will block until the current thread is the only remaining attached thread before it attempts to unload the virtual machine instance.
Question: If a Java UDF loops indefinitely, then we won't be able to shutdown the server. So perhaps we should not call DestroyJavaVM() on shutdown?
Invoking Java Methods Using The JNI
Here's a simplified example of invoking a hypothetical Java method Foo.bar(100) where the method signature int bar(int) indicates a primitive Java int as input and a primitive Java int as output.
{
JNIEnv *env;
// pointer to a valid native method interface for this thread
jint result;
jvm->AttachCurrentThread ((void **) &env, NULL);
// obtain a pointer to a valid JNI env
jclass cls = env->FindClass ("Foo");
jmethodID mid = env->GetStaticMethodID (cls, "bar", "(I)I");
result = env->CallStaticIntMethod (cls, mid, 100);
jvm->DetachCurrentThread ();
return result;
}
Passing and returning Java objects, rather than primitives, is more complicated, and will be described later.
The above example uses CallStaticIntMethod(), but in general, arguments will be passed into CallStatic<Type>MethodV() as a variable number of arguments in an ANSI C va_list. The "V" at the end of the method name indicates that arguments are passed in as a variable arguments list (va_list). This will be done by using the family of JNI functions:
<NativeType> CallStatic<Type>MethodV (
JNIEnv *env, jclass clazz, jmethodID methodId, va_list args);
This family of functions consists of ten members:
CallStatic<Type>MethodV | <NativeType> |
---|---|
CallStaticVoidMethodV | void |
CallStaticObjectMethodV | jobject |
CallStaticBooleanMethodV | jboolean |
CallStaticByteMethodV | jbyte |
CallStaticCharMethodV | jchar |
CallStaticShortMethodV | jshort |
CallStaticIntMethodV | jint |
CallStaticLongMethodV | jlong |
CallStaticFloatMethodV | jfloat |
CallStaticDoubleMethodV | jdouble |
Notice that a method signature "(I)I" is used in the call to GetStaticMethodID in the example above.
jmethodID mid = env->GetStaticMethodID (
cls, "bar", "(I)I");
The method is determined by its name and signature. The JNI uses signature strings to denote method types for the methods arguments and return value. For example, "(I,D)V" denotes a method that takes two arguments of primitive type int and double and has return type void. Object types are denoted using "Lclassname;" The JNI uses the Java VM's representation of type signatures as follows:
Type Signature | Java Type |
---|---|
Z | boolean |
B | byte |
C | char |
S | short |
I | int |
J | long |
F | float |
D | double |
L fully-qualified-class ; | fully-qualified-class |
[ type | type[] |
( arg-types ) ret-type | method type |
For example, the Java method:
long f (int n, String s, int[] arr);
has the following type signature:
(ILjava/lang/String;[I)J
So given a Java UDF declaration, we must be able to generate the appropriate method signature using the rules of the table above.
Note
GetMethodID will also cause an uninitialized class to be initialized.
Performance will be increased by caching method IDs and classes to be reused on successive UDF invocations. Method IDs computed by multiple threads for the same method will necessarily be the same. The current prototype does not cache method IDs, and UDF invocations on multiple database records will result in multiple AttachCurrentThread() calls.
Creating Java Objects
InterBase native objects (DATEs, CSTRINGs, INT64s, etc...) need to be converted to Java objects before invoking the Java method corresponding to a Java UDF.
Java UDF Declared Type | Java Method Declared Type | Description |
---|---|---|
JSTRING | java.lang.String | The UDF type JSTRING indicates that the Java UDF expects an object of class java.lang.String to be passed or returned. This is analagous to CSTRING for native UDFs. Except that native UDFs perform no implicit character conversions, and no character encoding is enforced on the passed C strings (C strings are passed byte-for-byte). InterBase CHAR or VARCHAR fields of any character set may be passed to Java UDFs by an implicit conversion to a Java Unicode String. Any native InterBase character set which is convertable to and from intermediary Unicode FSS by the InterBase engine will be supported. Design note: The conversion from the intermediary Unicode FSS to a 2-byte Unicode representation is handled by the Java UDF implementation. The user will not be aware of the intermediary Unicode FSS representation. |
TIMESTAMP, TIME, or DATE | java.sql.Timestamp, java.sql.Time, or java.sql.Date | Dates and Times may be passed to and from Java UDFs by converting InterBase dates and times to and from Java dates and times as described by the JDBC java.sql interfaces. Design note: This conversion code currently exists in InterClient, and may be reused. |
BLOB | com.borland.interbase.Blob | The UDF type BLOB indicates that the Java UDF method expects an object of class com.borland.interbase.Blob to be passed or returned. A Blob class is introduced to encapsulate an InterBase Blob and provide the necessary methods for getting and putting segments to the Blob. |
NUMERIC(p,s) or DECIMAL(p,s) | java.math.BigInteger | Exact numerics could be passed to and from Java UDFs by converting the InterBase INT64, INTEGER, and SMALLINT representations to and from Java longs, ints, and shorts respectively, depending on precision p. But, as with native UDFs, this would not provide scale, and the user is therefore burdened with having to know the scale of the field ahead of time, and make any required scale adjustments within the UDF. Using java.math.BigInteger is a better choice as this provides the exact value of the numeric pre-adjusted for the scale. Note: As with native UDFs, precisions greater than 18 are not supported, but could be accomodated by java.math.BigInteger in some future release of InterBase. java.math.BigInteger is the standard JDBC class used for large numerics. |
DOUBLE PRECISION | double | The Java UDF method expects a Java double to be passed or returned. |
INTEGER | int | The Java UDF method expects a Java int to be passed or returned. |
SMALLINT | short | The Java UDF method expects a Java short to be passed or returned. |
For the primitive types double, int, and short this is trivial. The following table describes the Java primitive types and their machine-dependent native equivalents.
Java Type | Native Type | Description |
---|---|---|
boolean | jboolean | unsigned 8 bits |
byte | jbyte | signed 8 bits |
char | jchar | unsigned 16 bits |
short | jshort | signed 16 bits |
int | jint | signed 32 bits |
long | jlong | signed 64 bits |
float | jfloat | 32 bits |
double | jdouble | 64 bits |
void | void | N/A |
The arguments passed into CallStatic<Type>MethodV() as a variable arguments list (va_list) are each of type jvalue. The jvalue union type is used as the element type in argument arrays. It is declared as follows:
typedef union jvalue {
jboolean z;
jbyte b;
jchar c;
jshort s;
jint i;
jlong j;
jfloat f;
jdouble d;
jobject l;
} jvalue;
jobject is a reference (pointer) type. And all Java objects represented in the JNI are subclasses of jobject.
The JNI includes a number of standard reference types that correspond to different kinds of reference types in the Java programming language. JNI reference types are organized in the hierarchy shown here.
Reference Type Hierarchy
Note
The JNI include files enforce this subclass hierarchy by using empty class declarations such as:
class _jobject {};
class _jstring: public _jobject {};
etc...
Objects, eg. of class java.sql.Date or java.lang.String, must be created using JNI constructor functions such as NewObject(). For example, here is the JNI code to construct a java.lang.String object from Unicode characters stored in a C buffer:
jstring makeJavaString (
JNIEnv *env, jchar *chars, jint len)
{
jclass stringClass;
jmethodID constructorId;
jcharArray elementArray;
jstring result;
stringClass = (*env)->FindClass (
env, "java/lang/String");
if (stringClass == NULL) return NULL;
// Get the method ID for the String(char[]) constructor
constructorId = (
*env)->GetMethodID (env, stringClass, "init", "([C)V");
if (constructorId == NULL) return NULL;
// Create a Java char[] that holds the string characters
elementArray = (
*env)->NewCharArray (env, len);
if (elementArray == NULL) return NULL;
(*env)->SetCharArrayRegion (
env, elementArray, 0, len, chars);
// Construct a java.lang.String object
result = (*env)->NewObject (
env, stringClass, constructorId, elementArray);
// Free local references
(*env)->DeleteLocalRef (env, elementArray);
(*env)->DeleteLocalRef (env, stringClass);
return result;
}
Notice that object references should be deleted after use.
The engine must ensure that C strings [of any character set type] passed to Java UDFs are always converted to a C buffer of 2-byte Unicode characters before invoking the makeJavaString method. Conversion routines currently exists in the engine to convert to UTF-8. Likewise, any Java strings returned from a Java UDF must also be converted from Unicode bytes to the declared character set type of the InterBase field.
Calling the GetMethodID() function every time we need a constructor ID to construct a java.sql.Date, java.lang.String, java.math.BigInteger, or com.borland.interbase.Blob object can be a performance hit. Therefore, the above code to construct a java.lang.String can be rewritten to cache the constructorId after it is first computed as follows:
jstring makeJavaString (
JNIEnv *env, jchar *chars, jint len)
{
jclass stringClass;
static jmethodID constructorId = NULL;
jcharArray elementArray;
jstring result;
stringClass = (*env)->FindClass (
env, "java/lang/String");
if (stringClass == NULL) return NULL;
// Get the method ID for the String(char[]) constructor
if (constructorId == NULL) {
// Only call GetMethodID if its not already set
constructorId = (
*env)->GetMethodID (env, stringClass, "<init>", "([C)V");
if (constructorId == NULL) return NULL;
}
// Create a Java char[] that holds the string characters
elementArray = (
*env)->NewCharArray (env, len);
if (elementArray == NULL) return NULL;
(*env)->SetCharArrayRegion (
env, elementArray, 0, len, chars);
// Construct a java.lang.String object
result = (*env)->NewObject (
env, stringClass, constructorId, elementArray);
// Free local references
(*env)->DeleteLocalRef (env, elementArray);
(*env)->DeleteLocalRef (env, stringClass);
return result;
}
Notice that constructorId is now declared as static. There is an obvious race condition in this code. Namely, if multiple threads call makeJavaString() at the same time and compute the method ID concurrently, one thread may overwrite the static variable constructorId computed by another thread. Although this race condition can lead to duplicated work in multiple threads, it is otherwise harmless. Method IDs computed by multiple threads for the same method will necessarily be the same.
This same approach for caching method IDs for object constructors can also be applied to the Java UDF methods themselves. However, the current prototype does not cache method IDs.
Accessing Java Objects
TBD
LALR(1) Syntax and Parse Tree Generation
Preparing Java UDF DDL
When Java UDF DDL is prepared by dsql/dsql.c:prepare(), yyparse() is called and a parse tree is generated with root node labeled nod_def_java_udf. This root node has e_java_udf_count=5 node arguments which may be indexed by:
#define e_java_udf_name 0 #define e_java_udf_class 1 #define e_java_udf_method 2 #define e_java_udf_args 3 #define e_java_udf_return_value 4
prepare() then generates a BLR request by calling dsql/ddl.c:generate_dyn() which in turn switches on the node labeled nod_def_java_udf to generates a DYN request for the Java UDF as follows:
gds__dyn_def_java_function node->nod_arg[e_java_udf_name] gds__dyn_java_func_class node->nod_arg[e_java_udf_class] gds__dyn_java_func_method node->nod_arg[e_java_udf_method] gds__dyn_java_func_return_argument node->nod_arg[e_java_udf_return_value] gds__dyn_java_function_name node->nod_arg[e_java_udf_name]
And for each argument in the node->nod_arg[e_java_udf_args] subtree generate the DYN gds__dyn_def_java_function_arg with the proper argument.
Executing Java UDF DDL
When Java UDF DDL is executed by dsql/dsql.c:execute_request(), jrd/dyn.c:DYN_ddl() is called which switches on the DYN request verb gds__dyn_def_java_function. This in turn calls jrd/dyn_def.e:DYN_define_java_function() which executes the DYN by storing a record into RDB$FUNCTIONS with fields specified by the DYN sub-verbs already specified above.
Note
DYN_define_java_function() may actually share (merge) with function DYN_define_function() since the code for each will be very similar. Records are also stored into RDB$FUNCTION_ARGS as usual for each argument specified by gds__dyn_def_java_function_arg.
The field RDB$FUNCTION_TYPE is currently unused. This will now hold the value 1 for a Java UDF, and the value 0 for a native UDF.
Other DYN verbs which may or may not be necessary are gds__dyn_delete_java_function and gds__dyn_def_java_filter, as well as the parse node label nod_del_java_udf. See DYN_define_filter().
Question: The database block (dbb) field dbb_functions, which maintains a list of FUN* structures, is only used by dsql/metd.e:METD_get_function(). Do we need to do something here for Java UDFs? The database block (dbb) field dbb_modules is set but never used.
Preparing Java UDF DML
As with native UDFs, when Java UDF DML is prepared by dsql/dsql.c:prepare(), an execution tree node labeled nod_function is generated for the function reference with e_fun_length=2 node arguments which may be indexed by:
#define e_fun_function 0 #define e_fun_args 1
The first node argument node->nod_arg[e_fun_function] is populated with a FUN* pointer to an extended fun structure as follows:
typedef struct fun {
struct blk fun_header;
STR fun_exception_message;
/* message containing the exception error message */
struct fun *fun_homonym;
/* Homonym functions */
struct sym *fun_symbol;
/* Symbol block */
int (*fun_entrypoint)();
/* Function entrypoint */
jmethodID fun_java_method_id;
/* JNI method ID for the Java method */
jclass fun_java_class
/* JNI class for the Java method */
USHORT fun_count;
/* Number of arguments including return,
size of repeating arg descriptor array below) */
USHORT fun_args;
/* Number of input arguments */
USHORT fun_return_arg;
/* Return argument (position of return arg in
repeating arg descriptor array below) */
USHORT fun_type;
/* Type of function */
ULONG fun_temp_length;
/* Temporary space required */
struct fun_repeat
{
DSC fun_desc;
/* Datatype info */
FUN_T fun_mechanism;
/* Passing mechanism */
} fun_rpt [1];
} *FUN;
If the function reference has never been parsed before, then the function name will not show up as a hashed symbol in dbb->dbb_hash_table. So for the first reference only, a FUN *function object is allocated and assigned values from the system tables. This is done as follows.
An RDB$FUNCTIONS entry that matches the function name is found, and its field values are assigned to the function fields. For native UDFs, function->fun_entrypoint is assigned the result of ISC_lookup_entrypoint(RDB$MODULE_NAME, RDB$ENTRY_POINT). ISC_lookup_entrypoint() looks in the global list of loaded modules (FLU_modules), and inserts the module (MOD*) if it's not already loaded. Once the module is loaded using LoadLibrary(), the entrypoint is found using GetProcAddress() and the function pointer is assigned to function->fun_entrypoint. The allocated function object is then put into the database symbol table using the function name as the hash symbol key. Any future references to the function name will extract the function object from the database symbol table.
For Java UDFs, a function is also allocated and inserted into the symbol table, but function->fun_entrypoint is set to NULL.
For Java UDFs, function->fun_java_method_id must be computed by first computing a method signature. The method signature is computed by looping through the RDB$FUNCTION_ARGUMENTS and constructing a signature string based on the type of the arguments. See table Java VM Type Signatures above.
function->fun_java_method_id is then computed from the signature string and the values of RDB$CLASS_NAME and RDB$METHOD_NAME using JNI functions as described earlier.
function->fun_java_class is computed from the value of RDB$CLASS_NAME.
The appropriate invocation method will be determined during execution by looking at the return type of the function. See table Variable Argument JNI Method Invocation Functions above. We may want to just save a function pointer to it here during the prepare in function->fun_java_invocation_method.
Executing Java UDF DML
jrd/evl.c:EVL_expr() switches on the execution tree node labeled nod_function generated during the prepare, and calls FUN_evaluate() with the node arguments. Remember that the first node argument (node->nod_arg[e_fun_function]) is the FUN* function object described earlier.
EVL_expr() is called recursively on each of the function arguments (node->nod_arg[e_fun_args]). The procedure for storing results in VLU structures is not described but should be similar to how values are stored when executing native UDFs. The results of the recursive argument evaluations must be converted to local Java objects via the JNI, and accumulated into a variable argument list (va_list) of jvalue objects.
These objects are then passed to the appropriate Java invocation method corresponding to the Java UDF as determined by:
function->fun_java_method_id function->fun_java_class function->fun_return_arg OR function->fun_java_invocation_method
com.borland.interbase.Blob
package com.borland.interbase;
/**
* This class represents a Blob as passed to a Java UDF.
* A Blob UDF cannot open or close a Blob, but
* instead invokes Blob methods
* to perform Blob access.
* A UDF that returns a Blob does not actually define
* a return value.
* Instead, the return-Blob must be passed as the last
* input parameter to the UDF.
**/
public class Blob
{
/**
* Read a Blob segment into a buffer, and return
* the number of bytes read.
**/
public int getSegment (byte[] buffer);
/**
* Write a Blob segment of bytesToPut bytes from a buffer.
**/
public void putSegment (byte[] buffer, int bytesToPut);
/**
* Returns the total number of segments in the Blob.
**/
public long numberOfSegments ();
/**
* The size, in bytes, of the largest single
* segment in the Blob.
**/
public int maxSegmentLength ();
/**
* Returns the actual total size, in bytes, of the Blob.
**/
public long size ();
}
How these methods will callback to the engine is TBD.
Exceptions
There are various JNI functions that deal with exceptions. This is TBD.
Sun's Java 2 Runtime Environment (JRE)
The exact files to distribute on install need to be listed here. This is a non-flat directory structure containing a couple dozen files.
gdef -extract
TBD
Standard Blob Library
Future.
Linking With Unknown Java Virtual Machines
Rather than linking with a particular JVM library at build-time. We will make all references to JNI functions within the engine dynamic by programmatically loading the appropriate DLL via LoadLibrary(), and finding all the JNI function pointers via GetProcAddress()
// Return a function pointer to the JNI
// function "AttachCurrentThread"
// in a variable JVM library.
void *findAttachCurrentThread (char *jvmLibrary)
{
HINSTANCE hVM = LoadLibrary (jvmLibrary);
if (hVM == NULL) return NULL;
return GetProcAddress (hVM, "AttachCurrentThread");
}
The Solaris version is:
// Return a function pointer to the JNI function
// "AttachCurrentThread" in a variable JVM library.
void *findAttachCurrentThread (char* jvmLibrary)
{
void *libVM = dlopen (jvmLibrary, RTLD_LAZY);
if (libVM == NULL) return NULL;
return dlsym (libVM, "AttachCurrentThread");
}