summaryrefslogtreecommitdiff
path: root/libdb_java/README
diff options
context:
space:
mode:
Diffstat (limited to 'libdb_java/README')
-rw-r--r--libdb_java/README190
1 files changed, 190 insertions, 0 deletions
diff --git a/libdb_java/README b/libdb_java/README
new file mode 100644
index 0000000..50db690
--- /dev/null
+++ b/libdb_java/README
@@ -0,0 +1,190 @@
+Berkeley DB's Java API
+$Id$
+
+Berkeley DB's Java API is now generated with SWIG
+(http://www.swig.org). This document describes how SWIG is used -
+what we trust it to do, what things we needed to work around.
+
+
+Overview
+========
+
+SWIG is a tool that generates wrappers around native (C/C++) APIs for
+various languages (mainly scripting languages) including Java.
+
+By default, SWIG creates an API in the target language that exactly
+replicates the native API (for example, each pointer type in the API
+is wrapped as a distinct type in the language). Although this
+simplifies the wrapper layer (type translation is trivial), it usually
+doesn't result in natural API in the target language.
+
+A further constraint for Berkeley DB's Java API was backwards
+compatibility. The original hand-coded Java API is in widespread use,
+and included many design decisions about how native types should be
+represented in Java. As an example, callback functions are
+represented by Java interfaces that applications using Berkeley DB
+could implement. The SWIG implementation was required to maintain
+backwards compatibility for those applications.
+
+
+Running SWIG
+============
+
+The simplest use of SWIG is to simply run it with a C include file as
+input. SWIG parses the file and generates wrapper code for the target
+language. For Java, this includes a Java class for each C struct and
+a C source file containing the Java Native Interface (JNI) function
+calls for each native method.
+
+The s_swig shell script in db/dist runs SWIG, and then post-processes
+each Java source file with the sed commands in
+libdb_java/java-post.sed. The Java sources are placed in
+java/src/com/sleepycat/db, and the native wrapper code is in a single
+file in libdb_java/db_java_wrap.c.
+
+The post-processing step modifies code in ways that is difficult with
+SWIG (given my current level of knowledge). This includes changing
+some access modifiers to hide some of the implementation methods,
+selectively adding "throws" clauses to methods, and adding calls to
+"initialize" methods in Db and DbEnv after they are constructed (more
+below on what these aclls do).
+
+In addition to the source code generated by SWIG, some of the Java
+classes are written by hand, and constants and code to fill statistics
+structures are generated by the script dist/s_java. The native
+statistics code is in libdb_java/java_stat_auto.c, and is compiled
+into the db_java_wrap object file with a #include directive. This
+allows most functions in that object to be static, which encourages
+compiler inlining and reduces the number of symbols we export.
+
+
+The Implementation
+==================
+
+For the reasons mentioned above, Berkeley DB requires a more
+sophisticated mapping between the native API and Java, so additional
+SWIG directives are added to the input. In particular:
+
+* The general intention is for db.i to contain the full DB API (just
+ like db.h). As much as possible, this file is kept Java independent
+ so that it can be updated easily when the API changes. SWIG doesn't
+ have any builtin rules for how to handle function pointers in a
+ struct, so each DB method must be added in a SWIG "%extend" block
+ which includes the method signature and a call to the method.
+
+ * SWIG's automatically generated function names happen to collide
+ with Berkeley DB's naming convention. For example, in a SWIG class
+ called __db, a method called "open" would result in a wrapper
+ function called "__db_open", which already exists in DB. This is
+ another reason why making these static functions is important.
+
+* The main Java support starts in db_java.i - this file includes all
+ Java code that is explicitly inserted into the generated classes,
+ and is responsible for defining object lifecycles (handling
+ allocation and cleanup).
+
+ * Methods that need to be wrapped for special handling in Java code
+ are renamed with a trailing zero (e.g., close becomes close0).
+ This is invisible to applications.
+
+ * Most DB classes that are wrapped have method calls that imply the
+ cleanup of any native resources associated with the Java object
+ (for example, Db.close or DbTxn.abort). These methods are wrapped
+ so that if the object is accessed after the native part has been
+ destroyed, an exception is thrown rather than a trap that crashes
+ the JVM.
+
+ * Db and DbEnv initialization is more complex: a global reference is
+ stored in the corresponding struct so that native code can
+ efficiently map back to Java code. In addition, if a Db is
+ created without an environment (i.e., in a private environment),
+ the initialization wraps the internal DbEnv to simplify handling
+ of various Db methods that just call the corresponding DbEnv
+ method (like err, errx, etc.). It is important that the global
+ references are cleaned up before the DB and DB_ENV handles are
+ closed, so the Java objects can be garbage collected.
+
+ * In the case of DbLock and DbLsn, there are no such methods. In
+ these cases, there is a finalize method that does the appropriate
+ cleanup. No other classes have finalize methods (in particular,
+ the Dbt class is now implemented entirely in Java, so no
+ finalization is necessary).
+
+* Overall initialization code, including the System.loadLibrary call,
+ is in java_util.i. This includes looking up all class, field and
+ method handles once so that execution is not slowed down by repeated
+ runtime type queries.
+
+* Exception handling is in java_except.i. The main non-obvious design
+ choice was to create a db_ret_t type for methods that return an
+ error code as an int in the C API, but return void in the Java API
+ (and throw exceptions on error).
+
+ * The only other odd case with exceptions is DbMemoryException -
+ this is thrown as normal when a call returns ENOMEM, but there is
+ special handling for the case where a Dbt with DB_DBT_USERMEM is
+ not big enough to handle a result: in this case, the Dbt handling
+ code calls the method update_dbt on the exception that is about to
+ be thrown to register the failed Dbt in the exception.
+
+* Statistics handling is in java_stat.i - this mainly just hooks into
+ the automatically-generated code in java_stat_auto.c.
+
+* Callbacks: the general approach is that Db and DbEnv maintain
+ references to the objects that handle each callback, and have a
+ helper method for each call. This is primarily to simplify the
+ native code, and performs better than more complex native code.
+
+ * One difference with the new approach is that the implementation is
+ more careful about calling DeleteLocalRef on objects created for
+ callbacks. This is particularly important for callbacks like
+ bt_compare, which may be called repeatedly from native code.
+ Without the DeleteLocalRef calls, the Java objects that are
+ created can not be collected until the original call returns.
+
+* Most of the rest of the code is in java_typemaps.i. A typemap is a
+ rule describing how a native type is mapped onto a Java type for
+ parameters and return values. These handle most of the complexity
+ of creating exactly the Java API we want.
+
+ * One of the main areas of complexity is Dbt handling. The approach
+ taken is to accept whatever data is passed in by the application,
+ pass that to native code, and reflect any changes to the native
+ DBT back into the Java object. In other words, the Dbt typemaps
+ don't replicate DB's rules about whether Dbts will be modified or
+ not - they just pass the data through.
+
+ * As noted above, when a Dbt is "released" (i.e., no longer needed
+ in native code), one of the check is whether a DbMemoryException
+ is pending, and if so, whether this Dbt might be the cause. In
+ that case, the Dbt is added to the exception via the "update_dbt"
+ method.
+
+* Constant handling has been simplified by making DbConstants an
+ interface. This allows the Db class to inherit the constants, and
+ most can be inlined by javac.
+
+ * The danger here is if applications are compiled against one
+ version of db.jar, but run against another. This danger existed
+ previously, but was partly ameliorated by a separation of
+ constants into "case" and "non-case" constants (the non-case
+ constants were arranged so they could not be inlined). The only
+ complete solution to this problem is for applications to check the
+ version returned by DbEnv.get_version* versus the Db.DB_VERSION*
+ constants.
+
+
+Application-visible changes
+===========================
+
+* The new API is around 5x faster for many operations.
+
+* Some internal methods and constructors that were previously public
+ have been hidden or removed.
+
+* A few methods that were inconsistent have been cleaned up (e.g.,
+ Db.close now returns void, was an int but always zero). The
+ synchronized attributed has been toggled on some methods - this is
+ an attempt to prevent multi-threaded applications shooting
+ themselves in the foot by calling close() or similar methods
+ concurrently from multiple threads.