diff options
Diffstat (limited to 'libdb_java/README')
-rw-r--r-- | libdb_java/README | 190 |
1 files changed, 190 insertions, 0 deletions
diff --git a/libdb_java/README b/libdb_java/README new file mode 100644 index 0000000..50db690 --- /dev/null +++ b/libdb_java/README @@ -0,0 +1,190 @@ +Berkeley DB's Java API +$Id$ + +Berkeley DB's Java API is now generated with SWIG +(http://www.swig.org). This document describes how SWIG is used - +what we trust it to do, what things we needed to work around. + + +Overview +======== + +SWIG is a tool that generates wrappers around native (C/C++) APIs for +various languages (mainly scripting languages) including Java. + +By default, SWIG creates an API in the target language that exactly +replicates the native API (for example, each pointer type in the API +is wrapped as a distinct type in the language). Although this +simplifies the wrapper layer (type translation is trivial), it usually +doesn't result in natural API in the target language. + +A further constraint for Berkeley DB's Java API was backwards +compatibility. The original hand-coded Java API is in widespread use, +and included many design decisions about how native types should be +represented in Java. As an example, callback functions are +represented by Java interfaces that applications using Berkeley DB +could implement. The SWIG implementation was required to maintain +backwards compatibility for those applications. + + +Running SWIG +============ + +The simplest use of SWIG is to simply run it with a C include file as +input. SWIG parses the file and generates wrapper code for the target +language. For Java, this includes a Java class for each C struct and +a C source file containing the Java Native Interface (JNI) function +calls for each native method. + +The s_swig shell script in db/dist runs SWIG, and then post-processes +each Java source file with the sed commands in +libdb_java/java-post.sed. The Java sources are placed in +java/src/com/sleepycat/db, and the native wrapper code is in a single +file in libdb_java/db_java_wrap.c. + +The post-processing step modifies code in ways that is difficult with +SWIG (given my current level of knowledge). This includes changing +some access modifiers to hide some of the implementation methods, +selectively adding "throws" clauses to methods, and adding calls to +"initialize" methods in Db and DbEnv after they are constructed (more +below on what these aclls do). + +In addition to the source code generated by SWIG, some of the Java +classes are written by hand, and constants and code to fill statistics +structures are generated by the script dist/s_java. The native +statistics code is in libdb_java/java_stat_auto.c, and is compiled +into the db_java_wrap object file with a #include directive. This +allows most functions in that object to be static, which encourages +compiler inlining and reduces the number of symbols we export. + + +The Implementation +================== + +For the reasons mentioned above, Berkeley DB requires a more +sophisticated mapping between the native API and Java, so additional +SWIG directives are added to the input. In particular: + +* The general intention is for db.i to contain the full DB API (just + like db.h). As much as possible, this file is kept Java independent + so that it can be updated easily when the API changes. SWIG doesn't + have any builtin rules for how to handle function pointers in a + struct, so each DB method must be added in a SWIG "%extend" block + which includes the method signature and a call to the method. + + * SWIG's automatically generated function names happen to collide + with Berkeley DB's naming convention. For example, in a SWIG class + called __db, a method called "open" would result in a wrapper + function called "__db_open", which already exists in DB. This is + another reason why making these static functions is important. + +* The main Java support starts in db_java.i - this file includes all + Java code that is explicitly inserted into the generated classes, + and is responsible for defining object lifecycles (handling + allocation and cleanup). + + * Methods that need to be wrapped for special handling in Java code + are renamed with a trailing zero (e.g., close becomes close0). + This is invisible to applications. + + * Most DB classes that are wrapped have method calls that imply the + cleanup of any native resources associated with the Java object + (for example, Db.close or DbTxn.abort). These methods are wrapped + so that if the object is accessed after the native part has been + destroyed, an exception is thrown rather than a trap that crashes + the JVM. + + * Db and DbEnv initialization is more complex: a global reference is + stored in the corresponding struct so that native code can + efficiently map back to Java code. In addition, if a Db is + created without an environment (i.e., in a private environment), + the initialization wraps the internal DbEnv to simplify handling + of various Db methods that just call the corresponding DbEnv + method (like err, errx, etc.). It is important that the global + references are cleaned up before the DB and DB_ENV handles are + closed, so the Java objects can be garbage collected. + + * In the case of DbLock and DbLsn, there are no such methods. In + these cases, there is a finalize method that does the appropriate + cleanup. No other classes have finalize methods (in particular, + the Dbt class is now implemented entirely in Java, so no + finalization is necessary). + +* Overall initialization code, including the System.loadLibrary call, + is in java_util.i. This includes looking up all class, field and + method handles once so that execution is not slowed down by repeated + runtime type queries. + +* Exception handling is in java_except.i. The main non-obvious design + choice was to create a db_ret_t type for methods that return an + error code as an int in the C API, but return void in the Java API + (and throw exceptions on error). + + * The only other odd case with exceptions is DbMemoryException - + this is thrown as normal when a call returns ENOMEM, but there is + special handling for the case where a Dbt with DB_DBT_USERMEM is + not big enough to handle a result: in this case, the Dbt handling + code calls the method update_dbt on the exception that is about to + be thrown to register the failed Dbt in the exception. + +* Statistics handling is in java_stat.i - this mainly just hooks into + the automatically-generated code in java_stat_auto.c. + +* Callbacks: the general approach is that Db and DbEnv maintain + references to the objects that handle each callback, and have a + helper method for each call. This is primarily to simplify the + native code, and performs better than more complex native code. + + * One difference with the new approach is that the implementation is + more careful about calling DeleteLocalRef on objects created for + callbacks. This is particularly important for callbacks like + bt_compare, which may be called repeatedly from native code. + Without the DeleteLocalRef calls, the Java objects that are + created can not be collected until the original call returns. + +* Most of the rest of the code is in java_typemaps.i. A typemap is a + rule describing how a native type is mapped onto a Java type for + parameters and return values. These handle most of the complexity + of creating exactly the Java API we want. + + * One of the main areas of complexity is Dbt handling. The approach + taken is to accept whatever data is passed in by the application, + pass that to native code, and reflect any changes to the native + DBT back into the Java object. In other words, the Dbt typemaps + don't replicate DB's rules about whether Dbts will be modified or + not - they just pass the data through. + + * As noted above, when a Dbt is "released" (i.e., no longer needed + in native code), one of the check is whether a DbMemoryException + is pending, and if so, whether this Dbt might be the cause. In + that case, the Dbt is added to the exception via the "update_dbt" + method. + +* Constant handling has been simplified by making DbConstants an + interface. This allows the Db class to inherit the constants, and + most can be inlined by javac. + + * The danger here is if applications are compiled against one + version of db.jar, but run against another. This danger existed + previously, but was partly ameliorated by a separation of + constants into "case" and "non-case" constants (the non-case + constants were arranged so they could not be inlined). The only + complete solution to this problem is for applications to check the + version returned by DbEnv.get_version* versus the Db.DB_VERSION* + constants. + + +Application-visible changes +=========================== + +* The new API is around 5x faster for many operations. + +* Some internal methods and constructors that were previously public + have been hidden or removed. + +* A few methods that were inconsistent have been cleaned up (e.g., + Db.close now returns void, was an int but always zero). The + synchronized attributed has been toggled on some methods - this is + an attempt to prevent multi-threaded applications shooting + themselves in the foot by calling close() or similar methods + concurrently from multiple threads. |