Berkeley DB's Java API $Id: README,v 12.2 2006/08/24 14:46:10 bostic Exp $ Berkeley DB's Java API is now generated with SWIG (http://www.swig.org). This document describes how SWIG is used - what we trust it to do, what things we needed to work around. Overview ======== SWIG is a tool that generates wrappers around native (C/C++) APIs for various languages (mainly scripting languages) including Java. By default, SWIG creates an API in the target language that exactly replicates the native API (for example, each pointer type in the API is wrapped as a distinct type in the language). Although this simplifies the wrapper layer (type translation is trivial), it usually doesn't result in natural API in the target language. A further constraint for Berkeley DB's Java API was backwards compatibility. The original hand-coded Java API is in widespread use, and included many design decisions about how native types should be represented in Java. As an example, callback functions are represented by Java interfaces that applications using Berkeley DB could implement. The SWIG implementation was required to maintain backwards compatibility for those applications. Running SWIG ============ The simplest use of SWIG is to simply run it with a C include file as input. SWIG parses the file and generates wrapper code for the target language. For Java, this includes a Java class for each C struct and a C source file containing the Java Native Interface (JNI) function calls for each native method. The s_swig shell script in db/dist runs SWIG, and then post-processes each Java source file with the sed commands in libdb_java/java-post.sed. The Java sources are placed in java/src/com/sleepycat/db, and the native wrapper code is in a single file in libdb_java/db_java_wrap.c. The post-processing step modifies code in ways that is difficult with SWIG (given my current level of knowledge). This includes changing some access modifiers to hide some of the implementation methods, selectively adding "throws" clauses to methods, and adding calls to "initialize" methods in Db and DbEnv after they are constructed (more below on what these aclls do). In addition to the source code generated by SWIG, some of the Java classes are written by hand, and constants and code to fill statistics structures are generated by the script dist/s_java. The native statistics code is in libdb_java/java_stat_auto.c, and is compiled into the db_java_wrap object file with a #include directive. This allows most functions in that object to be static, which encourages compiler inlining and reduces the number of symbols we export. The Implementation ================== For the reasons mentioned above, Berkeley DB requires a more sophisticated mapping between the native API and Java, so additional SWIG directives are added to the input. In particular: * The general intention is for db.i to contain the full DB API (just like db.h). As much as possible, this file is kept Java independent so that it can be updated easily when the API changes. SWIG doesn't have any builtin rules for how to handle function pointers in a struct, so each DB method must be added in a SWIG "%extend" block which includes the method signature and a call to the method. * SWIG's automatically generated function names happen to collide with Berkeley DB's naming convention. For example, in a SWIG class called __db, a method called "open" would result in a wrapper function called "__db_open", which already exists in DB. This is another reason why making these static functions is important. * The main Java support starts in db_java.i - this file includes all Java code that is explicitly inserted into the generated classes, and is responsible for defining object lifecycles (handling allocation and cleanup). * Methods that need to be wrapped for special handling in Java code are renamed with a trailing zero (e.g., close becomes close0). This is invisible to applications. * Most DB classes that are wrapped have method calls that imply the cleanup of any native resources associated with the Java object (for example, Db.close or DbTxn.abort). These methods are wrapped so that if the object is accessed after the native part has been destroyed, an exception is thrown rather than a trap that crashes the JVM. * Db and DbEnv initialization is more complex: a global reference is stored in the corresponding struct so that native code can efficiently map back to Java code. In addition, if a Db is created without an environment (i.e., in a private environment), the initialization wraps the internal DbEnv to simplify handling of various Db methods that just call the corresponding DbEnv method (like err, errx, etc.). It is important that the global references are cleaned up before the DB and DB_ENV handles are closed, so the Java objects can be garbage collected. * In the case of DbLock and DbLsn, there are no such methods. In these cases, there is a finalize method that does the appropriate cleanup. No other classes have finalize methods (in particular, the Dbt class is now implemented entirely in Java, so no finalization is necessary). * Overall initialization code, including the System.loadLibrary call, is in java_util.i. This includes looking up all class, field and method handles once so that execution is not slowed down by repeated runtime type queries. * Exception handling is in java_except.i. The main non-obvious design choice was to create a db_ret_t type for methods that return an error code as an int in the C API, but return void in the Java API (and throw exceptions on error). * The only other odd case with exceptions is DbMemoryException - this is thrown as normal when a call returns ENOMEM, but there is special handling for the case where a Dbt with DB_DBT_USERMEM is not big enough to handle a result: in this case, the Dbt handling code calls the method update_dbt on the exception that is about to be thrown to register the failed Dbt in the exception. * Statistics handling is in java_stat.i - this mainly just hooks into the automatically-generated code in java_stat_auto.c. * Callbacks: the general approach is that Db and DbEnv maintain references to the objects that handle each callback, and have a helper method for each call. This is primarily to simplify the native code, and performs better than more complex native code. * One difference with the new approach is that the implementation is more careful about calling DeleteLocalRef on objects created for callbacks. This is particularly important for callbacks like bt_compare, which may be called repeatedly from native code. Without the DeleteLocalRef calls, the Java objects that are created can not be collected until the original call returns. * Most of the rest of the code is in java_typemaps.i. A typemap is a rule describing how a native type is mapped onto a Java type for parameters and return values. These handle most of the complexity of creating exactly the Java API we want. * One of the main areas of complexity is Dbt handling. The approach taken is to accept whatever data is passed in by the application, pass that to native code, and reflect any changes to the native DBT back into the Java object. In other words, the Dbt typemaps don't replicate DB's rules about whether Dbts will be modified or not - they just pass the data through. * As noted above, when a Dbt is "released" (i.e., no longer needed in native code), one of the check is whether a DbMemoryException is pending, and if so, whether this Dbt might be the cause. In that case, the Dbt is added to the exception via the "update_dbt" method. * Constant handling has been simplified by making DbConstants an interface. This allows the Db class to inherit the constants, and most can be inlined by javac. * The danger here is if applications are compiled against one version of db.jar, but run against another. This danger existed previously, but was partly ameliorated by a separation of constants into "case" and "non-case" constants (the non-case constants were arranged so they could not be inlined). The only complete solution to this problem is for applications to check the version returned by DbEnv.get_version* versus the Db.DB_VERSION* constants. Application-visible changes =========================== * The new API is around 5x faster for many operations. * Some internal methods and constructors that were previously public have been hidden or removed. * A few methods that were inconsistent have been cleaned up (e.g., Db.close now returns void, was an int but always zero). The synchronized attributed has been toggled on some methods - this is an attempt to prevent multi-threaded applications shooting themselves in the foot by calling close() or similar methods concurrently from multiple threads.