Libxslt is the XSLT C library developed for the GNOME project. XSLT itself is a an XML language to define transformation for XML. Libxslt is based on libxml2 the XML C library developed for the GNOME project. It also implements most of the EXSLT set of processor-portable extensions functions and some of Saxon's evaluate and expressions extensions.
People can either embed the library in their application or use xsltproc the command line processing tool. This library is free software and can be reused in commercial applications (see the intro)
External documents:
Logo designed by Marc Liyanage.
This document describes libxslt, the XSLT C library developed for the GNOME project.
Here are some key points about libxslt:
There are some on-line resources about using libxslt:
If you need help with the XSLT language itself, here are a number of useful resources:
Well, bugs or missing features are always possible, and I will make a point of fixing them in a timely fashion. The best way to report a bug is to use the GNOME bug tracking database (make sure to use the "libxslt" module name). Before filing a bug, check the list of existing libxslt bugs to make sure it hasn't already been filed. I look at reports there regularly and it's good to have a reminder when a bug is still open. Be sure to specify that the bug is for the package libxslt.
For small problems you can try to get help on IRC, the #xml channel on irc.gnome.org (port 6667) usually have a few person subscribed which may help (but there is no guarantee and if a real issue is raised it should go on the mailing-list for archival).
There is also a mailing-list xslt@gnome.org for libxslt, with an on-line archive. To subscribe to this list, please visit the associated Web page and follow the instructions.
Alternatively, you can just send the bug to the xslt@gnome.org list, if it's really libxslt related I will approve it.. Please do not send me mail directly especially for portability problem, it makes things really harder to track and in some cases I'm not the best person to answer a given question, ask the list instead. Do not send code, I won't debug it (but patches are really appreciated!).
Please note that with the current amount of virus and SPAM, sending mail to the list without being subscribed won't work. There is *far too many bounces* (in the order of a thousand a day !) I cannot approve them manually anymore. If your mail to the list bounced waiting for administrator approval, it is LOST ! Repost it and fix the problem triggering the error. Also please note that emails with a legal warning asking to not copy or redistribute freely the information they contain are NOT acceptable for the mailing-list, such mail will as much as possible be discarded automatically, and are less likely to be answered if they made it to the list, DO NOT post to the list from an email address where such legal requirements are automatically added, get private paying support if you can't share information.
Check the following too before posting:
Then send the bug with associated information to reproduce it to the xslt@gnome.org list; if it's really libxslt related I will approve it. Please do not send mail to me directly, it makes things really hard to track and in some cases I am not the best person to answer a given question, ask on the list.
To be really clear about support:
Of course, bugs reports with a suggested patch for fixing them will probably be processed faster.
If you're looking for help, a quick look at the list archive may actually provide the answer, I usually send source samples when answering libxslt usage questions. The auto-generated documentation is not as polished as I would like (I need to learn more about Docbook), but it's a good starting point.
You can help the project in various ways, the best thing to do first is to subscribe to the mailing-list as explained before, check the archives and the GNOME bug database::
The latest versions of libxslt can be found on the xmlsoft.org server. (NOTE that you need the libxml2, libxml2-devel, libxslt and libxslt-devel packages installed to compile applications using libxslt.) Igor Zlatkovic is now the maintainer of the Windows port, he provides binaries. CSW provides Solaris binaries, and Steve Ball provides Mac Os X binaries.
I do accept external contributions, especially if compiling on another platform, get in touch with me to upload the package. I will keep them in the contrib directory
Libxslt is also available from GIT:
See libxslt Git web. To checkout a local tree use:
git clone git://git.gnome.org/libxslt
Usually the problem comes from the fact that the compiler doesn't get
the right compilation or linking flags. There is a small shell script
xslt-config
which is installed as part of libxslt usual
install process which provides those flags. Use
xslt-config --cflags
to get the compilation flags and
xslt-config --libs
to get the linker flags. Usually this is done directly from the Makefile as:
CFLAGS=`xslt-config --cflags`
LIBS=`xslt-config --libs`
Note also that if you use the EXSLT extensions from the program then
you should prepend -lexslt
to the LIBS options
xsltproc --param test alpha foo.xsl foo.xml
the param does not get passed and ends up as ""
In a nutshell do a double escaping at the shell prompt:
xsltproc --param test "'alpha'" foo.xsl foo.xml
i.e. the string value is surrounded by " and ' then terminated by '
and ". Libxslt interpret the parameter values as XPath expressions, so
the string ->alpha
<- is intepreted as the node set
matching this string. You really want ->'alpha'
<- to
be passed to the processor. And to allow this you need to escape the
quotes at the shell level using ->"'alpha'"
<- .
or use
xsltproc --stringparam test alpha foo.xsl foo.xml
Yes for example xmlwrapp , see the related pages about bindings
The change log describes the recents commits to the SVN code base.
Those are the public releases made:
This is a bugfix only release
speed of large text output, xsl:copy with attributes, strip-space and namespaces prefix, fix for --path xsltproc option, EXST:tokenize (Shaun McCance), EXSLT:seconds (William Brack), sort with multiple keys (William Brack), checking of { and } for attribute value templates (William Brack)
stylesheet compilation (Igor Zlatkovic), NaN and sort (William Brack), RVT bug introduced in 1.0.30
Mostly a bug fix release.
This program is the simplest way to use libxslt: from the command line. It is also used for doing the regression tests of the library.
It takes as first argument the path or URL to an XSLT stylesheet, the next arguments are filenames or URIs of the inputs to be processed. The output of the processing is redirected on the standard output. There is actually a few more options available:
orchis:~ -> xsltproc Usage: xsltproc [options] stylesheet file [file ...] Options: --version or -V: show the version of libxml and libxslt used --verbose or -v: show logs of what's happening --output file or -o file: save to a given file --timing: display the time used --repeat: run the transformation 20 times --debug: dump the tree of the result instead --novalid: skip the DTD loading phase --noout: do not dump the result --maxdepth val : increase the maximum depth --html: the input document is(are) an HTML file(s) --docbook: the input document is SGML docbook --param name value : pass a (parameter,value) pair --nonet refuse to fetch DTDs or entities over network --warnnet warn against fetching over the network --catalogs : use the catalogs from $SGML_CATALOG_FILES --xinclude : do XInclude processing on document input --profile or --norman : dump profiling information orchis:~ ->
DocBook is an XML/SGML vocabulary particularly well suited to books and papers about computer hardware and software.
xsltproc and libxslt are not specifically dependant on DocBook, but since a lot of people use xsltproc and libxml2 for DocBook formatting, here are a few pointers and information which may be helpful:
export XMLCATALOG=$HOME/xmlcatalog
should allow to process DocBook documentations without requiring network accesses for the DTd or stylesheets
Do not use the --docbook option of xsltproc to process XML DocBook documents, this option is only intended to provide some (limited) support of the SGML version of DocBook.
Points which are not DocBook specific but still worth mentionning again:
xmllint --valid --noout path_to_document
to make sure that your input is valid DocBook. And fixes the errors before processing further. Note that XSLT processing may work correctly with some forms of validity errors left, but in general it can give troubles on output.
Okay this section is clearly incomplete. But integrating libxslt into your application should be relatively easy. First check the few steps described below, then for more detailed information, look at the generated pages for the API and the source of libxslt/xsltproc.c and the tutorial.
Basically doing an XSLT transformation can be done in a few steps:
xmlSubstituteEntitiesDefault(1);
xmlLoadExtDtdDefaultValue = 1;
Steps 2,3, and 5 will probably need to be changed depending on you processing needs and environment for example if reading/saving from/to memory, or if you want to apply XInclude processing to the stylesheet or input documents.
There is a number of language bindings and wrappers available for libxml2, the list below is not exhaustive. Please contact the xml-bindings@gnome.org (archives) in order to get updates to this list or to discuss the specific topic of libxml2 or libxslt wrappers or bindings:
The libxslt Python module depends on the libxml2 Python module.
The distribution includes a set of Python bindings, which are guaranteed to be maintained as part of the library in the future, though the Python interface have not yet reached the completeness of the C API.
Stéphane Bidoul maintains a Windows port of the Python bindings.
Note to people interested in building bindings, the API is formalized as an XML API description file which allows to automate a large part of the Python bindings, this includes function descriptions, enums, structures, typedefs, etc... The Python script used to build the bindings is python/generator.py in the source distribution.
To install the Python bindings there are 2 options:
The distribution includes a set of examples and regression tests for the
python bindings in the python/tests
directory. Here are some
excepts from those tests:
This is a basic test of XSLT interfaces: loading a stylesheet and a document, transforming the document and saving the result.
import libxml2 import libxslt styledoc = libxml2.parseFile("test.xsl") style = libxslt.parseStylesheetDoc(styledoc) doc = libxml2.parseFile("test.xml") result = style.applyStylesheet(doc, None) style.saveResultToFilename("foo", result, 0) style.freeStylesheet() doc.freeDoc() result.freeDoc()
The Python module is called libxslt, you will also need the libxml2 module for the operations on XML trees. Let's have a look at the objects manipulated in that example and how is the processing done:
styledoc
: is a libxml2 document tree. It is obtained by
parsing the XML file "test.xsl" containing the stylesheet.style
: this is a precompiled stylesheet ready to be used
by the following transformations (note the plural form, multiple
transformations can resuse the same stylesheet).doc
: this is the document to apply the transformation to.
In this case it is simply generated by parsing it from a file but any
other processing is possible as long as one get a libxml2 Doc. Note that
HTML tree are suitable for XSLT processing in libxslt. This is actually
how this page is generated !result
: this is a document generated by applying the
stylesheet to the document. Note that some of the stylesheet information
may be related to the serialization of that document and as in this
example a specific saveResultToFilename() method of the stylesheet should
be used to save it to a file (in that case to "foo").Also note the need to explicitely deallocate documents with freeDoc() except for the stylesheet document which is freed when its compiled form is garbage collected.
This one is a far more complex test. It shows how to modify the behaviour of an XSLT transformation by passing parameters and how to extend the XSLT engine with functions defined in python:
import libxml2 import libxslt import string nodeName = None def f(ctx, str): global nodeName # # Small check to verify the context is correcly accessed # try: pctxt = libxslt.xpathParserContext(_obj=ctx) ctxt = pctxt.context() tctxt = ctxt.transformContext() nodeName = tctxt.insertNode().name except: pass return string.upper(str) libxslt.registerExtModuleFunction("foo", "http://example.com/foo", f)
This code defines and register an extension function. Note that the function can be bound to any name (foo) and how the binding is also associated to a namespace name "http://example.com/foo". From an XSLT point of view the function just returns an upper case version of the string passed as a parameter. But the first part of the function also read some contextual information from the current XSLT processing environement, in that case it looks for the current insertion node in the resulting output (either the resulting document or the Result Value Tree being generated), and saves it to a global variable for checking that the access actually worked.
For more information on the xpathParserContext and transformContext objects check the libray internals description. The pctxt is actually an object from a class derived from the libxml2.xpathParserContext() with just a couple more properties including the possibility to look up the XSLT transformation context from the XPath context.
styledoc = libxml2.parseDoc(""" <xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:foo='http://example.com/foo' xsl:exclude-result-prefixes='foo'> <xsl:param name='bar'>failure</xsl:param> <xsl:template match='/'> <article><xsl:value-of select='foo:foo($bar)'/></article> </xsl:template> </xsl:stylesheet> """)
Here is a simple example of how to read an XML document from a python string with libxml2. Note how this stylesheet:
bar
style = libxslt.parseStylesheetDoc(styledoc) doc = libxml2.parseDoc("<doc/>") result = style.applyStylesheet(doc, { "bar": "'success'" }) style.freeStylesheet() doc.freeDoc()
that part is identical, to the basic example except that the transformation is passed a dictionary of parameters. Note that the string passed "success" had to be quoted, otherwise it is interpreted as an XPath query for the childs of root named "success".
root = result.children if root.name != "article": print "Unexpected root node name" sys.exit(1) if root.content != "SUCCESS": print "Unexpected root node content, extension function failed" sys.exit(1) if nodeName != 'article': print "The function callback failed to access its context" sys.exit(1) result.freeDoc()
That part just verifies that the transformation worked, that the parameter got properly passed to the engine, that the function f() got called and that it properly accessed the context to find the name of the insertion node.
this module is a bit too long to be described there but it is basically a rewrite of the xsltproc command line interface of libxslt in Python. It provides nearly all the functionalities of xsltproc and can be used as a base module to write Python customized XSLT processors. One of the thing to notice are:
libxml2.lineNumbersDefault(1) libxml2.substituteEntitiesDefault(1)
those two calls in the main() function are needed to force the libxml2 processor to generate DOM trees compliant with the XPath data model.
This document describes the processing of libxslt, the XSLT C library developed for the GNOME project.
Note: this documentation is by definition incomplete and I am not good at spelling, grammar, so patches and suggestions are really welcome.
XSLT is a transformation language. It takes an input document and a stylesheet document and generates an output document:
Libxslt is written in C. It relies on libxml, the XML C library for GNOME, for the following operations:
Libxslt is not very specialized. It is built under the assumption that all nodes from the source and output document can fit in the virtual memory of the system. There is a big trade-off there. It is fine for reasonably sized documents but may not be suitable for large sets of data. The gain is that it can be used in a relatively versatile way. The input or output may never be serialized, but the size of documents it can handle are limited by the size of the memory available.
More specialized memory handling approaches are possible, like building the input tree from a serialization progressively as it is consumed, factoring repetitive patterns, or even on-the-fly generation of the output as the input is parsed but it is possible only for a limited subset of the stylesheets. In general the implementation of libxslt follows the following pattern:
The result is not that bad, clearly one can do a better job but more specialized too. Most optimization like building the tree on-demand would need serious changes to the libxml XPath framework. An easy step would be to serialize the output directly (or call a set of SAX-like output handler to keep this a flexible interface) and hence avoid the memory consumption of the result.
DOM-like trees, as used and generated by libxml and libxslt, are relatively complex. Most node types follow the given structure except a few variations depending on the node type:
Nodes carry a name and the node type indicates the kind of node it represents, the most common ones are:
For the XSLT processing, entity nodes should not be generated (i.e. they should be replaced by their content). Most nodes also contains the following "navigation" information:
Elements nodes carries the list of attributes in the properties, an attribute itself holds the navigation pointers and the children list (the attribute value is not represented as a simple string to allow usage of entities references).
The ns points to the namespace declaration for the namespace associated to the node, nsDef is the linked list of namespace declaration present on element nodes.
Most nodes also carry an _private pointer which can be used by the application to hold specific data on this node.
There are a few steps which are clearly decoupled at the interface level:
A few things should be noted here:
This is the second step described. It takes a stylesheet tree, and "compiles" it. This associates to each node a structure stored in the _private field and containing information computed in the stylesheet:
One xsltStylesheet structure is generated per document parsed for the stylesheet. XSLT documents allow includes and imports of other documents, imports are stored in the imports list (hence keeping the tree hierarchy of includes which is very important for a proper XSLT processing model) and includes are stored in the doclist list. An imported stylesheet has a parent link to allow browsing of the tree.
The DOM tree associated to the document is stored in doc. It is preprocessed to remove ignorable empty nodes and all the nodes in the XSLT namespace are subject to precomputing. This usually consist of extracting all the context information from the context tree (attributes, namespaces, XPath expressions), and storing them in an xsltStylePreComp structure associated to the _private field of the node.
A couple of notable exceptions to this are XSLT template nodes (more on this later) and attribute value templates. If they are actually templates, the value cannot be computed at compilation time. (Some preprocessing could be done like isolation and preparsing of the XPath subexpressions but it's not done, yet.)
The xsltStylePreComp structure also allows storing of the precompiled form of an XPath expression that can be associated to an XSLT element (more on this later).
A proper handling of templates lookup is one of the keys of fast XSLT processing. (Given a node in the source document this is the process of finding which templates should be applied to this node.) Libxslt follows the hint suggested in the 5.2 Patterns section of the XSLT Recommendation, i.e. it doesn't evaluate it as an XPath expression but tokenizes it and compiles it as a set of rules to be evaluated on a candidate node. There usually is an indication of the node name in the last step of this evaluation and this is used as a key check for the match. As a result libxslt builds a relatively more complex set of structures for the templates:
Let's describe a bit more closely what is built. First the xsltStylesheet structure holds a pointer to the template hash table. All the XSLT patterns compiled in this stylesheet are indexed by the value of the the target element (or attribute, pi ...) name, so when a element or an attribute "foo" needs to be processed the lookup is done using the name as a key.
Each of the patterns is compiled into an xsltCompMatch (i.e. an ''XSLT compiled match') structure. It holds the set of rules based on the tokenization of the pattern stored in reverse order (matching is easier this way).
The xsltCompMatch are then stored in the hash table, the clash list is itself sorted by priority of the template to implement "naturally" the XSLT priority rules.
Associated to the compiled pattern is the xsltTemplate itself containing the information required for the processing of the pattern including, of course, a pointer to the list of elements used for building the pattern result.
Last but not least a number of patterns do not fit in the hash table because they are not associated to a name, this is the case for patterns applying to the root, any element, any attributes, text nodes, pi nodes, keys etc. Those are stored independently in the stylesheet structure as separate linked lists of xsltCompMatch.
The processing is defined by the XSLT specification (the basis of the algorithm is explained in the Introduction section). Basically it works by taking the root of the input document as the cureent node and applying the following algorithm:
The closure is usually done through the XSLT apply-templatesconstruct, which invokes this process recursively starting at step 1, to find the appropriate template for the nodes selected by the 'select' attribute of the apply-templates instruction (default: the children of the node currently being processed)
Note that large parts of the input tree may not be processed by a given stylesheet and that conversely some may be processed multiple times. (This often is the case when a Table of Contents is built).
The module transform.c
is the one implementing most of this
logic. xsltApplyStylesheet() is the entry point, it
allocates an xsltTransformContext containing the following:
Then a new document gets allocated (HTML or XML depending on the type of output), the user parameters and global variables and parameters are evaluated. Then xsltProcessOneNode() which implements the 1-2-3 algorithm is called on the docuemnt node of the input. Step 1/ is implemented by calling xsltGetTemplate(), step 2/ is implemented by xsltDefaultProcessOneNode() and step 3/ is implemented by xsltApplyOneTemplate().
The XPath support is actually implemented in the libxml module (where it is reused by the XPointer implementation). XPath is a relatively classic expression language. The only uncommon feature is that it is working on XML trees and hence has specific syntax and types to handle them.
XPath expressions are compiled using xmlXPathCompile(). It will take an expression string in input and generate a structure containing the parsed expression tree, for example the expression:
/doc/chapter[title='Introduction']
will be compiled as
Compiled Expression : 10 elements SORT COLLECT 'child' 'name' 'node' chapter COLLECT 'child' 'name' 'node' doc ROOT PREDICATE SORT EQUAL = COLLECT 'child' 'name' 'node' title NODE ELEM Object is a string : Introduction COLLECT 'child' 'name' 'node' title NODE
This can be tested using the testXPath
command (in the
libxml codebase) using the --tree
option.
Again, the KISS approach is used. No optimization is done. This could be an interesting thing to add. Michael Kay describes a lot of possible and interesting optimizations done in Saxon which would be possible at this level. I'm unsure they would provide much gain since the expressions tends to be relatively simple in general and stylesheets are still hand generated. Optimizations at the interpretation sounds likely to be more efficient.
The interpreter is implemented by xmlXPathCompiledEval()
which is the front-end to xmlXPathCompOpEval() the function
implementing the evaluation of the expression tree. This evaluation follows
the KISS approach again. It's recursive and calls
xmlXPathNodeCollectAndTest() to collect a set of nodes when
evaluating a COLLECT
node.
An evaluation is done within the framework of an XPath context stored in an xmlXPathContext structure, in the framework of a transformation the context is maintained within the XSLT context. Its content follows the requirements from the XPath specification:
For the purpose of XSLT an extra pointer has been added allowing to retrieve the XSLT transformation context. When an XPath evaluation is about to be performed, an XPath parser context is allocated containing an XPath object stack (this is actually an XPath evaluation context, this is a relic of the time where there was no separate parsing and evaluation phase in the XPath implementation). Here is an overview of the set of contexts associated to an XPath evaluation within an XSLT transformation:
Clearly this is a bit too complex and confusing and should be refactored at the next set of binary incompatible releases of libxml. For example the xmlXPathCtxt has a lot of unused parts and should probably be merged with xmlXPathParserCtxt.
An XPath expression manipulates XPath objects. XPath defines the default types boolean, numbers, strings and node sets. XSLT adds the result tree fragment type which is basically an unmodifiable node set.
Implementation-wise, libxml follows again a KISS approach, the xmlXPathObject is a structure containing a type description and the various possibilities. (Using an enum could have gained some bytes.) In the case of node sets (or result tree fragments), it points to a separate xmlNodeSet object which contains the list of pointers to the document nodes:
The XPath API (and its 'internal' part) includes a number of functions to create, copy, compare, convert or free XPath objects.
All the XPath functions available to the interpreter are registered in the function hash table linked from the XPath context. They all share the same signature:
void xmlXPathFunc (xmlXPathParserContextPtr ctxt, int nargs);
The first argument is the XPath interpretation context, holding the interpretation stack. The second argument defines the number of objects passed on the stack for the function to consume (last argument is on top of the stack).
Basically an XPath function does the following:
nargs
for proper handling of errors or functions
with variable numbers of parametersobj =
valuePop(ctxt);
valuePush(ctxt,
res);
xmlXPathFreeObject(obj);
Sometime the work can be done directly by modifying in-situ the top object
on the stack ctxt->value
.
Not to be confused with XPath object stack, this stack holds the XSLT variables and parameters as they are defined through the recursive calls of call-template, apply-templates and default templates. This is used to define the scope of variables being called.
This part seems to be one needing most work , first it is done in a very inefficient way since the location of the variables and parameters within the stylesheet tree is still done at run time (it really should be done statically at compile time), and I am still unsure that my understanding of the template variables and parameter scope is actually right.
This part of the documentation is still to be written once this part of the code will be stable. TODO
There is a separate document explaining how the extension support works.
Michael Kay wrote a really interesting article on Saxon internals and the work he did on performance issues. I wish I had read it before starting libxslt design (I would probably have avoided a few mistakes and progressed faster). A lot of the ideas in his papers should be implemented or at least tried in libxslt.
The libxml documentation, especially the I/O interfaces and the memory management.
redesign the XSLT stack frame handling. Far too much work is done at execution time. Similarly for the attribute value templates handling, at least the embedded subexpressions ought to be precompiled.
Allow output to be saved to a SAX like output (this notion of SAX like API for output should be added directly to libxml).
Implement and test some of the optimization explained by Michael Kay especially:
Error reporting, there is a lot of case where the XSLT specification specify that a given construct is an error are not checked adequately by libxslt. Basically one should do a complete pass on the XSLT spec again and add all tests to the stylesheet compilation. Using the DTD provided in the appendix and making direct checks using the libxml validation API sounds a good idea too (though one should take care of not raising errors for elements/attributes in different namespaces).
Double check all the places where the stylesheet compiled form might be modified at run time (extra removal of blanks nodes, hint on the xsltCompMatch).
Thanks to Michael Sperberg-McQueen for various fixes and clarifications on this document!
This document describes the work needed to write extensions to the standard XSLT library for use with libxslt, the XSLT C library developed for the GNOME project.
Before starting reading this document it is highly recommended to get familiar with the libxslt internals.
Note: this documentation is by definition incomplete and I am not good at spelling, grammar, so patches and suggestions are really welcome.
The XSLT specification provides two ways to extend an XSLT engine:
In both cases the extensions need to be associated to a new namespace, i.e. an URI used as the name for the extension's namespace (there is no need to have a resource there for this to work).
libxslt provides a few extensions itself, either in the libxslt namespace "http://xmlsoft.org/XSLT/namespace" or in namespaces for other well known extensions provided by other XSLT processors like Saxon, Xalan or XT.
Since extensions are bound to a namespace name, usually sets of extensions coming from a given source are using the same namespace name defining in practice a group of extensions providing elements, functions or both. From the libxslt point of view those are considered as an "extension module", and most of the APIs work at a module point of view.
Registration of new functions or elements are bound to the activation of
the module. This is currently done by declaring the namespace as an extension
by using the attribute extension-element-prefixes
on the
xsl:stylesheet
element.
An extension module is defined by 3 objects:
Currently a libxslt module has to be compiled within the application using libxslt. There is no code to load dynamically shared libraries associated to a namespace (this may be added but is likely to become a portability nightmare).
The current way to register a module is to link the code implementing it with the application and to call a registration function:
int xsltRegisterExtModule(const xmlChar *URI, xsltExtInitFunction initFunc, xsltExtShutdownFunction shutdownFunc);
The associated header is read by:
#include<libxslt/extensions.h>
which also defines the type for the initialization and shutdown functions
Once the module URI has been registered and if the XSLT processor detects that a given stylesheet needs the functionalities of an extended module, this one is initialized.
The xsltExtInitFunction type defines the interface for an initialization function:
/** * xsltExtInitFunction: * @ctxt: an XSLT transformation context * @URI: the namespace URI for the extension * * A function called at initialization time of an XSLT * extension module * * Returns a pointer to the module specific data for this * transformation */ typedef void *(*xsltExtInitFunction)(xsltTransformContextPtr ctxt, const xmlChar *URI);
There are 3 things to notice:
What this function is expected to do is:
There is a single call to do this registration:
int xsltRegisterExtFunction(xsltTransformContextPtr ctxt, const xmlChar *name, const xmlChar *URI, xmlXPathEvalFunc function);
The registration is bound to a single transformation instance referred by ctxt, name is the UTF8 encoded name for the NCName of the function, and URI is the namespace name for the extension (no checking is done, a module could register functions or elements from a different namespace, but it is not recommended).
The implementation of the function must have the signature of a libxml XPath function:
/** * xmlXPathEvalFunc: * @ctxt: an XPath parser context * @nargs: the number of arguments passed to the function * * an XPath evaluation function, the parameters are on the * XPath context stack */ typedef void (*xmlXPathEvalFunc)(xmlXPathParserContextPtr ctxt, int nargs);
The context passed to an XPath function is not an XSLT context but an XPath context. However it is possible to find one from the other:
xsltTransformContextPtr xsltXPathGetTransformContext (xmlXPathParserContextPtr ctxt);
xmlXPathContextPtr
associated to an
xsltTransformContext
is stored in the xpathCtxt
field.The first thing an extension function may want to do is to check the
arguments passed on the stack, the nargs
parameter will tell how
many of them were provided on the XPath expression. The macro valuePop will
extract them from the XPath stack:
#include <libxml/xpath.h> #include <libxml/xpathInternals.h> xmlXPathObjectPtr obj = valuePop(ctxt);
Note that ctxt
is the XPath context not the XSLT one. It is
then possible to examine the content of the value. Check the description of XPath objects if
necessary. The following is a common sequence checking whether the argument
passed is a string and converting it using the built-in XPath
string()
function if this is not the case:
if (obj->type != XPATH_STRING) { valuePush(ctxt, obj); xmlXPathStringFunction(ctxt, 1); obj = valuePop(ctxt); }
Most common XPath functions are available directly at the C level and are
exported either in <libxml/xpath.h>
or in
<libxml/xpathInternals.h>
.
The extension function may also need to retrieve the data associated to this module instance (the database connection in the previous example) this can be done using the xsltGetExtData:
void * xsltGetExtData(xsltTransformContextPtr ctxt, const xmlChar *URI);
Again the URI to be provided is the one which was used when registering the module.
Once the function finishes, don't forget to:
valuePush(ctxt,
obj)
xmlXPathFreeObject(obj)
The module libxslt/functions.c contains the sources of the XSLT built-in functions, including document(), key(), generate-id(), etc. as well as a full example module at the end. Here is the test function implementation for the libxslt:test function:
/** * xsltExtFunctionTest: * @ctxt: the XPath Parser context * @nargs: the number of arguments * * function libxslt:test() for testing the extensions support. */ static void xsltExtFunctionTest(xmlXPathParserContextPtr ctxt, int nargs) { xsltTransformContextPtr tctxt; void *data; tctxt = xsltXPathGetTransformContext(ctxt); if (tctxt == NULL) { xsltGenericError(xsltGenericErrorContext, "xsltExtFunctionTest: failed to get the transformation context\n"); return; } data = xsltGetExtData(tctxt, (const xmlChar *) XSLT_DEFAULT_URL); if (data == NULL) { xsltGenericError(xsltGenericErrorContext, "xsltExtFunctionTest: failed to get module data\n"); return; } #ifdef WITH_XSLT_DEBUG_FUNCTION xsltGenericDebug(xsltGenericDebugContext, "libxslt:test() called with %d args\n", nargs); #endif }
There is a single call to do this registration:
int xsltRegisterExtElement(xsltTransformContextPtr ctxt, const xmlChar *name, const xmlChar *URI, xsltTransformFunction function);
It is similar to the mechanism used to register an extension function, except that the signature of an extension element implementation is different.
The registration is bound to a single transformation instance referred to by ctxt, name is the UTF8 encoded name for the NCName of the element, and URI is the namespace name for the extension (no checking is done, a module could register elements for a different namespace, but it is not recommended).
The implementation of the element must have the signature of an XSLT transformation function:
/** * xsltTransformFunction: * @ctxt: the XSLT transformation context * @node: the input node * @inst: the stylesheet node * @comp: the compiled information from the stylesheet * * signature of the function associated to elements part of the * stylesheet language like xsl:if or xsl:apply-templates. */ typedef void (*xsltTransformFunction) (xsltTransformContextPtr ctxt, xmlNodePtr node, xmlNodePtr inst, xsltStylePreCompPtr comp);
The first argument is the XSLT transformation context. The second and
third arguments are xmlNodePtr i.e. internal memory representation of XML nodes. They are
respectively node
from the the input document being transformed
by the stylesheet and inst
the extension element in the
stylesheet. The last argument is comp
a pointer to a precompiled
representation of inst
but usually for an extension function
this value is NULL
by default (it could be added and associated
to the instruction in inst->_private
).
The same functions are available from a function implementing an extension
element as in an extension function, including
xsltGetExtData()
.
The goal of an extension element being usually to enrich the generated output, it is expected that they will grow the currently generated output tree. This can be done by grabbing ctxt->insert which is the current libxml node being generated (Note this can also be the intermediate value tree being built for example to initialize a variable, the processing should be similar). The functions for libxml tree manipulation from <libxml/tree.h> can be employed to extend or modify the tree, but it is required to preserve the insertion node and its ancestors since there are existing pointers to those elements still in use in the XSLT template execution stack.
The module libxslt/transform.c contains the sources of the XSLT built-in elements, including xsl:element, xsl:attribute, xsl:if, etc. There is a small but full example in functions.c providing the implementation for the libxslt:test element, it will output a comment in the result tree:
/** * xsltExtElementTest: * @ctxt: an XSLT processing context * @node: The current node * @inst: the instruction in the stylesheet * @comp: precomputed information * * Process a libxslt:test node */ static void xsltExtElementTest(xsltTransformContextPtr ctxt, xmlNodePtr node, xmlNodePtr inst, xsltStylePreCompPtr comp) { xmlNodePtr comment; if (ctxt == NULL) { xsltGenericError(xsltGenericErrorContext, "xsltExtElementTest: no transformation context\n"); return; } if (node == NULL) { xsltGenericError(xsltGenericErrorContext, "xsltExtElementTest: no current node\n"); return; } if (inst == NULL) { xsltGenericError(xsltGenericErrorContext, "xsltExtElementTest: no instruction\n"); return; } if (ctxt->insert == NULL) { xsltGenericError(xsltGenericErrorContext, "xsltExtElementTest: no insertion point\n"); return; } comment = xmlNewComment((const xmlChar *) "libxslt:test element test worked"); xmlAddChild(ctxt->insert, comment); }
When the XSLT processor ends a transformation, the shutdown function (if it exists) for each of the modules initialized is called. The xsltExtShutdownFunction type defines the interface for a shutdown function:
/** * xsltExtShutdownFunction: * @ctxt: an XSLT transformation context * @URI: the namespace URI for the extension * @data: the data associated to this module * * A function called at shutdown time of an XSLT extension module */ typedef void (*xsltExtShutdownFunction) (xsltTransformContextPtr ctxt, const xmlChar *URI, void *data);
This is really similar to a module initialization function except a third argument is passed, it's the value that was returned by the initialization function. This allows the routine to deallocate resources from the module for example close the connection to the database to keep the same example.
Well, some of the pieces missing: