1 files changed, 190 insertions, 0 deletions
diff --git a/libdb_java/README b/libdb_java/README
new file mode 100644
index 0000000..50db690
--- /dev/null
+++ b/libdb_java/README
@@ -0,0 +1,190 @@
+Berkeley DB's Java API
+$Id$
+
+Berkeley DB's Java API is now generated with SWIG
+(http://www.swig.org).  This document describes how SWIG is used -
+what we trust it to do, what things we needed to work around.
+
+
+Overview
+========
+
+SWIG is a tool that generates wrappers around native (C/C++) APIs for
+various languages (mainly scripting languages) including Java.
+
+By default, SWIG creates an API in the target language that exactly
+replicates the native API (for example, each pointer type in the API
+is wrapped as a distinct type in the language).  Although this
+simplifies the wrapper layer (type translation is trivial), it usually
+doesn't result in natural API in the target language.
+
+A further constraint for Berkeley DB's Java API was backwards
+compatibility.  The original hand-coded Java API is in widespread use,
+and included many design decisions about how native types should be
+represented in Java.  As an example, callback functions are
+represented by Java interfaces that applications using Berkeley DB
+could implement.  The SWIG implementation was required to maintain
+backwards compatibility for those applications.
+
+
+Running SWIG
+============
+
+The simplest use of SWIG is to simply run it with a C include file as
+input.  SWIG parses the file and generates wrapper code for the target
+language.  For Java, this includes a Java class for each C struct and
+a C source file containing the Java Native Interface (JNI) function
+calls for each native method.
+
+The s_swig shell script in db/dist runs SWIG, and then post-processes
+each Java source file with the sed commands in
+libdb_java/java-post.sed.  The Java sources are placed in
+java/src/com/sleepycat/db, and the native wrapper code is in a single
+file in libdb_java/db_java_wrap.c.
+
+The post-processing step modifies code in ways that is difficult with
+SWIG (given my current level of knowledge).  This includes changing
+some access modifiers to hide some of the implementation methods,
+selectively adding "throws" clauses to methods, and adding calls to
+"initialize" methods in Db and DbEnv after they are constructed (more
+below on what these aclls do).
+
+In addition to the source code generated by SWIG, some of the Java
+classes are written by hand, and constants and code to fill statistics
+structures are generated by the script dist/s_java.  The native
+statistics code is in libdb_java/java_stat_auto.c, and is compiled
+into the db_java_wrap object file with a #include directive.  This
+allows most functions in that object to be static, which encourages
+compiler inlining and reduces the number of symbols we export.
+
+
+The Implementation
+==================
+
+For the reasons mentioned above, Berkeley DB requires a more
+sophisticated mapping between the native API and Java, so additional
+SWIG directives are added to the input.  In particular:
+
+* The general intention is for db.i to contain the full DB API (just
+  like db.h).  As much as possible, this file is kept Java independent
+  so that it can be updated easily when the API changes.  SWIG doesn't
+  have any builtin rules for how to handle function pointers in a
+  struct, so each DB method must be added in a SWIG "%extend" block
+  which includes the method signature and a call to the method.
+
+  * SWIG's automatically generated function names happen to collide
+    with Berkeley DB's naming convention.  For example, in a SWIG class
+    called __db, a method called "open" would result in a wrapper
+    function called "__db_open", which already exists in DB.  This is
+    another reason why making these static functions is important.
+
+* The main Java support starts in db_java.i - this file includes all
+  Java code that is explicitly inserted into the generated classes,
+  and is responsible for defining object lifecycles (handling
+  allocation and cleanup).
+
+  * Methods that need to be wrapped for special handling in Java code
+    are renamed with a trailing zero (e.g., close becomes close0).
+    This is invisible to applications.
+
+  * Most DB classes that are wrapped have method calls that imply the
+    cleanup of any native resources associated with the Java object
+    (for example, Db.close or DbTxn.abort).  These methods are wrapped
+    so that if the object is accessed after the native part has been
+    destroyed, an exception is thrown rather than a trap that crashes
+    the JVM.
+
+  * Db and DbEnv initialization is more complex: a global reference is
+    stored in the corresponding struct so that native code can
+    efficiently map back to Java code.  In addition, if a Db is
+    created without an environment (i.e., in a private environment),
+    the initialization wraps the internal DbEnv to simplify handling
+    of various Db methods that just call the corresponding DbEnv
+    method (like err, errx, etc.).  It is important that the global
+    references are cleaned up before the DB and DB_ENV handles are
+    closed, so the Java objects can be garbage collected.
+
+  * In the case of DbLock and DbLsn, there are no such methods.  In
+    these cases, there is a finalize method that does the appropriate
+    cleanup.  No other classes have finalize methods (in particular,
+    the Dbt class is now implemented entirely in Java, so no
+    finalization is necessary).
+
+* Overall initialization code, including the System.loadLibrary call,
+  is in java_util.i.  This includes looking up all class, field and
+  method handles once so that execution is not slowed down by repeated
+  runtime type queries.
+
+* Exception handling is in java_except.i.  The main non-obvious design
+  choice was to create a db_ret_t type for methods that return an
+  error code as an int in the C API, but return void in the Java API
+  (and throw exceptions on error).
+
+  * The only other odd case with exceptions is DbMemoryException -
+    this is thrown as normal when a call returns ENOMEM, but there is
+    special handling for the case where a Dbt with DB_DBT_USERMEM is
+    not big enough to handle a result: in this case, the Dbt handling
+    code calls the method update_dbt on the exception that is about to
+    be thrown to register the failed Dbt in the exception.
+
+* Statistics handling is in java_stat.i - this mainly just hooks into
+  the automatically-generated code in java_stat_auto.c.
+
+* Callbacks: the general approach is that Db and DbEnv maintain
+  references to the objects that handle each callback, and have a
+  helper method for each call.  This is primarily to simplify the
+  native code, and performs better than more complex native code.
+
+  * One difference with the new approach is that the implementation is
+    more careful about calling DeleteLocalRef on objects created for
+    callbacks.  This is particularly important for callbacks like
+    bt_compare, which may be called repeatedly from native code.
+    Without the DeleteLocalRef calls, the Java objects that are
+    created can not be collected until the original call returns.
+
+* Most of the rest of the code is in java_typemaps.i.  A typemap is a
+  rule describing how a native type is mapped onto a Java type for
+  parameters and return values.  These handle most of the complexity
+  of creating exactly the Java API we want.
+
+  * One of the main areas of complexity is Dbt handling.  The approach
+    taken is to accept whatever data is passed in by the application,
+    pass that to native code, and reflect any changes to the native
+    DBT back into the Java object.  In other words, the Dbt typemaps
+    don't replicate DB's rules about whether Dbts will be modified or
+    not - they just pass the data through.
+
+  * As noted above, when a Dbt is "released" (i.e., no longer needed
+    in native code), one of the check is whether a DbMemoryException
+    is pending, and if so, whether this Dbt might be the cause.  In
+    that case, the Dbt is added to the exception via the "update_dbt"
+    method.
+
+* Constant handling has been simplified by making DbConstants an
+  interface.  This allows the Db class to inherit the constants, and
+  most can be inlined by javac.
+
+  * The danger here is if applications are compiled against one
+    version of db.jar, but run against another.  This danger existed
+    previously, but was partly ameliorated by a separation of
+    constants into "case" and "non-case" constants (the non-case
+    constants were arranged so they could not be inlined).  The only
+    complete solution to this problem is for applications to check the
+    version returned by DbEnv.get_version* versus the Db.DB_VERSION*
+    constants.
+
+
+Application-visible changes
+===========================
+
+* The new API is around 5x faster for many operations.
+
+* Some internal methods and constructors that were previously public
+  have been hidden or removed.
+
+* A few methods that were inconsistent have been cleaned up (e.g.,
+  Db.close now returns void, was an int but always zero).  The
+  synchronized attributed has been toggled on some methods - this is
+  an attempt to prevent multi-threaded applications shooting
+  themselves in the foot by calling close() or similar methods
+  concurrently from multiple threads.