diff options
Diffstat (limited to 'doc/internals.html')
-rw-r--r-- | doc/internals.html | 455 |
1 files changed, 228 insertions, 227 deletions
diff --git a/doc/internals.html b/doc/internals.html index 525dbc7a..f96eef3c 100644 --- a/doc/internals.html +++ b/doc/internals.html @@ -18,152 +18,152 @@ A:link, A:visited, A:active { text-decoration: underline } <li><a href="internals.html#processing">The processing itself</a></li> <li><a href="internals.html#XPath">XPath expressions compilation</a></li> <li><a href="internals.html#XPath1">XPath interpretation</a></li> - <li><a href="internals.html#Descriptio">Description of XPathObjects</a></li> + <li><a href="internals.html#Descriptio">Description of XPath + Objects</a></li> <li><a href="internals.html#XPath3">XPath functions</a></li> <li><a href="internals.html#stack">The variables stack frame</a></li> <li><a href="internals.html#Extension">Extension support</a></li> <li><a href="internals.html#Futher">Further reading</a></li> <li><a href="internals.html#TODOs">TODOs</a></li> -</ul><h3><a name="Introducti2" id="Introducti2">Introduction</a></h3><p>This document describes the processing of <a href="http://xmlsoft.org/XSLT/">libxslt</a>, the <a href="http://www.w3.org/TR/xslt">XSLT</a>C library developed for the <a href="http://www.gnome.org/">Gnome</a>project.</p><p>Note: this documentation is by definition incomplete and I am not good -atspelling, grammar, so patches and suggestions are <a href="mailto:veillard@redhat.com">really welcome</a>.</p><h3><a name="Basics1" id="Basics1">Basics</a></h3><p>XSLT is a transformation language. It takes an input document and -astylesheet document and generates an output document:</p><p align="center"><img src="processing.gif" alt="the XSLT processing model" /></p><p>Libxslt is written in C. It relies on <a href="http://www.xmlsoft.org/">libxml</a>, the XML C library for Gnome, -forthe following operations:</p><ul><li>parsing files</li> - <li>building the in-memory DOM structure associated with the - documentshandled</li> +</ul><h3><a name="Introducti2" id="Introducti2">Introduction</a></h3><p>This document describes the processing of <a href="http://xmlsoft.org/XSLT/">libxslt</a>, the <a href="http://www.w3.org/TR/xslt">XSLT</a> C library developed for the <a href="http://www.gnome.org/">Gnome</a> project.</p><p>Note: this documentation is by definition incomplete and I am not good at +spelling, grammar, so patches and suggestions are <a href="mailto:veillard@redhat.com">really welcome</a>.</p><h3><a name="Basics1" id="Basics1">Basics</a></h3><p>XSLT is a transformation language. It takes an input document and a +stylesheet document and generates an output document:</p><p align="center"><img src="processing.gif" alt="the XSLT processing model" /></p><p>Libxslt is written in C. It relies on <a href="http://www.xmlsoft.org/">libxml</a>, the XML C library for Gnome, for +the following operations:</p><ul><li>parsing files</li> + <li>building the in-memory DOM structure associated with the documents + handled</li> <li>the XPath implementation</li> - <li>serializing back the result document to XML and HTML. (Text is - handleddirectly.)</li> -</ul><h3><a name="Keep1" id="Keep1">Keep it simple stupid</a></h3><p>Libxslt is not very specialized. It is built under the assumption that -allnodes from the source and output document can fit in the virtual memory -ofthe system. There is a big trade-off there. It is fine for reasonably -sizeddocuments but may not be suitable for large sets of data. The gain is -that itcan be used in a relatively versatile way. The input or output may -never beserialized, but the size of documents it can handle are limited by -the sizeof the memory available.</p><p>More specialized memory handling approaches are possible, like buildingthe -input tree from a serialization progressively as it is consumed,factoring -repetitive patterns, or even on-the-fly generation of the output asthe input -is parsed but it is possible only for a limited subset of thestylesheets. In -general the implementation of libxslt follows the followingpattern:</p><ul><li>KISS (keep it simple stupid)</li> - <li>when there is a clear bottleneck optimize on top of this - simpleframework and refine only as much as is needed to reach the - expectedresult</li> -</ul><p>The result is not that bad, clearly one can do a better job but -morespecialized too. Most optimization like building the tree on-demand -wouldneed serious changes to the libxml XPath framework. An easy step would -be toserialize the output directly (or call a set of SAX-like output handler -tokeep this a flexible interface) and hence avoid the memory consumption of -theresult.</p><h3><a name="libxml" id="libxml">The libxml nodes</a></h3><p>DOM-like trees, as used and generated by libxml and libxslt, arerelatively -complex. Most node types follow the given structure except a fewvariations -depending on the node type:</p><p align="center"><img src="node.gif" alt="description of a libxml node" /></p><p>Nodes carry a <strong>name</strong>and the node -<strong>type</strong>indicates the kind of node it represents, the most -common ones are:</p><ul><li>document nodes</li> + <li>serializing back the result document to XML and HTML. (Text is handled + directly.)</li> +</ul><h3><a name="Keep1" id="Keep1">Keep it simple stupid</a></h3><p>Libxslt is not very specialized. It is built under the assumption that all +nodes from the source and output document can fit in the virtual memory of +the system. There is a big trade-off there. It is fine for reasonably sized +documents but may not be suitable for large sets of data. The gain is that it +can be used in a relatively versatile way. The input or output may never be +serialized, but the size of documents it can handle are limited by the size +of the memory available.</p><p>More specialized memory handling approaches are possible, like building +the input tree from a serialization progressively as it is consumed, +factoring repetitive patterns, or even on-the-fly generation of the output as +the input is parsed but it is possible only for a limited subset of the +stylesheets. In general the implementation of libxslt follows the following +pattern:</p><ul><li>KISS (keep it simple stupid)</li> + <li>when there is a clear bottleneck optimize on top of this simple + framework and refine only as much as is needed to reach the expected + result</li> +</ul><p>The result is not that bad, clearly one can do a better job but more +specialized too. Most optimization like building the tree on-demand would +need serious changes to the libxml XPath framework. An easy step would be to +serialize the output directly (or call a set of SAX-like output handler to +keep this a flexible interface) and hence avoid the memory consumption of the +result.</p><h3><a name="libxml" id="libxml">The libxml nodes</a></h3><p>DOM-like trees, as used and generated by libxml and libxslt, are +relatively complex. Most node types follow the given structure except a few +variations depending on the node type:</p><p align="center"><img src="node.gif" alt="description of a libxml node" /></p><p>Nodes carry a <strong>name</strong> and the node <strong>type</strong> +indicates the kind of node it represents, the most common ones are:</p><ul><li>document nodes</li> <li>element nodes</li> <li>text nodes</li> -</ul><p>For the XSLT processing, entity nodes should not be generated (i.e. -theyshould be replaced by their content). Most nodes also contains the -following"navigation" informations:</p><ul><li>the containing <strong>doc</strong>ument</li> - <li>the <strong>parent</strong>node</li> - <li>the first <strong>children</strong>node</li> - <li>the <strong>last</strong>children node</li> +</ul><p>For the XSLT processing, entity nodes should not be generated (i.e. they +should be replaced by their content). Most nodes also contains the following +"navigation" informations:</p><ul><li>the containing <strong>doc</strong>ument</li> + <li>the <strong>parent</strong> node</li> + <li>the first <strong>children</strong> node</li> + <li>the <strong>last</strong> children node</li> <li>the <strong>prev</strong>ious sibling</li> <li>the following sibling (<strong>next</strong>)</li> -</ul><p>Elements nodes carries the list of attributes in the properties, -anattribute itself holds the navigation pointers and the children list -(theattribute value is not represented as a simple string to allow usage -ofentities references).</p><p>The <strong>ns</strong>points to the namespace declaration for -thenamespace associated to the node, <strong>nsDef</strong>is the linked -listof namespace declaration present on element nodes.</p><p>Most nodes also carry an <strong>_private</strong>pointer which can beused -by the application to hold specific data on this node.</p><h3><a name="XSLT" id="XSLT">The XSLT processing steps</a></h3><p>There are a few steps which are clearly decoupled at the -interfacelevel:</p><ol><li>parse the stylesheet and generate a DOM tree</li> - <li>take the stylesheet tree and build a compiled version of it - (thecompilation phase)</li> +</ul><p>Elements nodes carries the list of attributes in the properties, an +attribute itself holds the navigation pointers and the children list (the +attribute value is not represented as a simple string to allow usage of +entities references).</p><p>The <strong>ns</strong> points to the namespace declaration for the +namespace associated to the node, <strong>nsDef</strong> is the linked list +of namespace declaration present on element nodes.</p><p>Most nodes also carry an <strong>_private</strong> pointer which can be +used by the application to hold specific data on this node.</p><h3><a name="XSLT" id="XSLT">The XSLT processing steps</a></h3><p>There are a few steps which are clearly decoupled at the interface +level:</p><ol><li>parse the stylesheet and generate a DOM tree</li> + <li>take the stylesheet tree and build a compiled version of it (the + compilation phase)</li> <li>take the input and generate a DOM tree</li> - <li>process the stylesheet against the input tree and generate an - outputtree</li> + <li>process the stylesheet against the input tree and generate an output + tree</li> <li>serialize the output tree</li> </ol><p>A few things should be noted here:</p><ul><li>the steps 1/ 3/ and 5/ are optional</li> - <li>the stylesheet obtained at 2/ can be reused by multiple processing - 4/(and this should also work in threaded programs)</li> - <li>the tree provided in 2/ should never be freed using xmlFreeDoc, but - byfreeing the stylesheet.</li> - <li>the input tree 4/ is not modified except the _private field which maybe - used for labelling keys if used by the stylesheet</li> -</ul><h3><a name="XSLT1" id="XSLT1">The XSLT stylesheet compilation</a></h3><p>This is the second step described. It takes a stylesheet tree, -and"compiles" it. This associates to each node a structure stored in -the_private field and containing information computed in the stylesheet:</p><p align="center"><img src="stylesheet.gif" alt="a compiled XSLT stylesheet" /></p><p>One xsltStylesheet structure is generated per document parsed for -thestylesheet. XSLT documents allow includes and imports of other -documents,imports are stored in the <strong>imports</strong>list (hence -keeping thetree hierarchy of includes which is very important for a proper -XSLTprocessing model) and includes are stored in the -<strong>doclist</strong>list. An imported stylesheet has a parent link to -allow browsing of thetree.</p><p>The DOM tree associated to the document is stored in -<strong>doc</strong>.It is preprocessed to remove ignorable empty nodes and -all the nodes in theXSLT namespace are subject to precomputing. This usually -consist ofextracting all the context information from the context tree -(attributes,namespaces, XPath expressions), and storing them in an -xsltStylePreCompstructure associated to the <strong>_private</strong>field of -the node.</p><p>A couple of notable exceptions to this are XSLT template nodes (more -onthis later) and attribute value templates. If they are actually -templates,the value cannot be computed at compilation time. (Some -preprocessing couldbe done like isolation and preparsing of the XPath -subexpressions but it'snot done, yet.)</p><p>The xsltStylePreComp structure also allows storing of the precompiled -formof an XPath expression that can be associated to an XSLT element (more -onthis later).</p><h3><a name="XSLT2" id="XSLT2">The XSLT template compilation</a></h3><p>A proper handling of templates lookup is one of the keys of fast -XSLTprocessing. (Given a node in the source document this is the process -offinding which templates should be applied to this node.) Libxslt follows -thehint suggested in the <a href="http://www.w3.org/TR/xslt#patterns">5.2Patterns</a>section of the XSLT -Recommendation, i.e. it doesn't evaluate itas an XPath expression but -tokenizes it and compiles it as a set of rules tobe evaluated on a candidate -node. There usually is an indication of the nodename in the last step of this -evaluation and this is used as a key check forthe match. As a result libxslt -builds a relatively more complex set ofstructures for the templates:</p><p align="center"><img src="templates.gif" alt="The templates related structure" /></p><p>Let's describe a bit more closely what is built. First the -xsltStylesheetstructure holds a pointer to the template hash table. All the -XSLT patternscompiled in this stylesheet are indexed by the value of the the -targetelement (or attribute, pi ...) name, so when a element or an attribute -"foo"needs to be processed the lookup is done using the name as a key.</p><p>Each of the patterns is compiled into an xsltCompMatch structure. It -holdsthe set of rules based on the tokenization of the pattern stored in -reverseorder (matching is easier this way). It also holds some information -about theprevious matches used to speed up the process when one iterates over -a set ofsiblings. (This optimization may be defeated by trashing when -runningthreaded computation, it's unclear that this is a big deal in -practice.)Predicate expressions are not compiled at this stage, they may be -at run-timeif needed, but in this case they are compiled as full XPath -expressions (theuse of some fixed predicate can probably be optimized, they -are not yet).</p><p>The xsltCompMatch are then stored in the hash table, the clash list -isitself sorted by priority of the template to implement "naturally" the -XSLTpriority rules.</p><p>Associated to the compiled pattern is the xsltTemplate itself -containingthe information required for the processing of the pattern -including, ofcourse, a pointer to the list of elements used for building the -patternresult.</p><p>Last but not least a number of patterns do not fit in the hash -tablebecause they are not associated to a name, this is the case for -patternsapplying to the root, any element, any attributes, text nodes, pi -nodes, keysetc. Those are stored independently in the stylesheet structure as -separatelinked lists of xsltCompMatch.</p><h3><a name="processing" id="processing">The processing itself</a></h3><p>The processing is defined by the XSLT specification (the basis of -thealgorithm is explained in <a href="http://www.w3.org/TR/xslt#section-Introduction">the -Introduction</a>section). Basically it works by taking the root of the input -document andapplying the following algorithm:</p><ol><li>Finding the template applying to it. This is a lookup in the - templatehash table, walking the hash list until the node satisfies all - the stepsof the pattern, then checking the appropriate(s) global - templates to seeif there isn't a higher priority rule to apply</li> - <li>If there is no template, apply the default rule (recurse on - thechildren)</li> + <li>the stylesheet obtained at 2/ can be reused by multiple processing 4/ + (and this should also work in threaded programs)</li> + <li>the tree provided in 2/ should never be freed using xmlFreeDoc, but by + freeing the stylesheet.</li> + <li>the input tree 4/ is not modified except the _private field which may + be used for labelling keys if used by the stylesheet</li> +</ul><h3><a name="XSLT1" id="XSLT1">The XSLT stylesheet compilation</a></h3><p>This is the second step described. It takes a stylesheet tree, and +"compiles" it. This associates to each node a structure stored in the +_private field and containing information computed in the stylesheet:</p><p align="center"><img src="stylesheet.gif" alt="a compiled XSLT stylesheet" /></p><p>One xsltStylesheet structure is generated per document parsed for the +stylesheet. XSLT documents allow includes and imports of other documents, +imports are stored in the <strong>imports</strong> list (hence keeping the +tree hierarchy of includes which is very important for a proper XSLT +processing model) and includes are stored in the <strong>doclist</strong> +list. An imported stylesheet has a parent link to allow browsing of the +tree.</p><p>The DOM tree associated to the document is stored in <strong>doc</strong>. +It is preprocessed to remove ignorable empty nodes and all the nodes in the +XSLT namespace are subject to precomputing. This usually consist of +extracting all the context information from the context tree (attributes, +namespaces, XPath expressions), and storing them in an xsltStylePreComp +structure associated to the <strong>_private</strong> field of the node.</p><p>A couple of notable exceptions to this are XSLT template nodes (more on +this later) and attribute value templates. If they are actually templates, +the value cannot be computed at compilation time. (Some preprocessing could +be done like isolation and preparsing of the XPath subexpressions but it's +not done, yet.)</p><p>The xsltStylePreComp structure also allows storing of the precompiled form +of an XPath expression that can be associated to an XSLT element (more on +this later).</p><h3><a name="XSLT2" id="XSLT2">The XSLT template compilation</a></h3><p>A proper handling of templates lookup is one of the keys of fast XSLT +processing. (Given a node in the source document this is the process of +finding which templates should be applied to this node.) Libxslt follows the +hint suggested in the <a href="http://www.w3.org/TR/xslt#patterns">5.2 +Patterns</a> section of the XSLT Recommendation, i.e. it doesn't evaluate it +as an XPath expression but tokenizes it and compiles it as a set of rules to +be evaluated on a candidate node. There usually is an indication of the node +name in the last step of this evaluation and this is used as a key check for +the match. As a result libxslt builds a relatively more complex set of +structures for the templates:</p><p align="center"><img src="templates.gif" alt="The templates related structure" /></p><p>Let's describe a bit more closely what is built. First the xsltStylesheet +structure holds a pointer to the template hash table. All the XSLT patterns +compiled in this stylesheet are indexed by the value of the the target +element (or attribute, pi ...) name, so when a element or an attribute "foo" +needs to be processed the lookup is done using the name as a key.</p><p>Each of the patterns is compiled into an xsltCompMatch structure. It holds +the set of rules based on the tokenization of the pattern stored in reverse +order (matching is easier this way). It also holds some information about the +previous matches used to speed up the process when one iterates over a set of +siblings. (This optimization may be defeated by trashing when running +threaded computation, it's unclear that this is a big deal in practice.) +Predicate expressions are not compiled at this stage, they may be at run-time +if needed, but in this case they are compiled as full XPath expressions (the +use of some fixed predicate can probably be optimized, they are not yet).</p><p>The xsltCompMatch are then stored in the hash table, the clash list is +itself sorted by priority of the template to implement "naturally" the XSLT +priority rules.</p><p>Associated to the compiled pattern is the xsltTemplate itself containing +the information required for the processing of the pattern including, of +course, a pointer to the list of elements used for building the pattern +result.</p><p>Last but not least a number of patterns do not fit in the hash table +because they are not associated to a name, this is the case for patterns +applying to the root, any element, any attributes, text nodes, pi nodes, keys +etc. Those are stored independently in the stylesheet structure as separate +linked lists of xsltCompMatch.</p><h3><a name="processing" id="processing">The processing itself</a></h3><p>The processing is defined by the XSLT specification (the basis of the +algorithm is explained in <a href="http://www.w3.org/TR/xslt#section-Introduction">the Introduction</a> +section). Basically it works by taking the root of the input document and +applying the following algorithm:</p><ol><li>Finding the template applying to it. This is a lookup in the template + hash table, walking the hash list until the node satisfies all the steps + of the pattern, then checking the appropriate(s) global templates to see + if there isn't a higher priority rule to apply</li> + <li>If there is no template, apply the default rule (recurse on the + children)</li> <li>else walk the content list of the selected templates, for each of them: - <ul><li>if the node is in the XSLT namespace then the node has a - _privatefield pointing to the preprocessed values, jump to the - specificcode</li> - <li>if the node is in an extension namespace, look up the - associatedbehavior</li> + <ul><li>if the node is in the XSLT namespace then the node has a _private + field pointing to the preprocessed values, jump to the specific + code</li> + <li>if the node is in an extension namespace, look up the associated + behavior</li> <li>otherwise copy the node.</li> - </ul><p>The closure is usually done through the - XSLT<strong>apply-templates</strong>construct recursing by applying - theadequate template on the input node children or on the result of - anassociated XPath selection lookup.</p> + </ul><p>The closure is usually done through the XSLT + <strong>apply-templates</strong> construct recursing by applying the + adequate template on the input node children or on the result of an + associated XPath selection lookup.</p> </li> -</ol><p>Note that large parts of the input tree may not be processed by a -givenstylesheet and that on the opposite some may be processed multiple -times.(This often is the case when a Table of Contents is built).</p><p>The module <code>transform.c</code>is the one implementing most of -thislogic. <strong>xsltApplyStylesheet()</strong>is the entry point, -itallocates an xsltTransformContext containing the following:</p><ul><li>a pointer to the stylesheet being processed</li> +</ol><p>Note that large parts of the input tree may not be processed by a given +stylesheet and that on the opposite some may be processed multiple times. +(This often is the case when a Table of Contents is built).</p><p>The module <code>transform.c</code> is the one implementing most of this +logic. <strong>xsltApplyStylesheet()</strong> is the entry point, it +allocates an xsltTransformContext containing the following:</p><ul><li>a pointer to the stylesheet being processed</li> <li>a stack of templates</li> <li>a stack of variables and parameters</li> <li>an XPath context</li> @@ -173,18 +173,18 @@ itallocates an xsltTransformContext containing the following:</p><ul><li>a point <li>current selected node list</li> <li>the current insertion points in the output document</li> <li>a couple of hash tables for extension elements and functions</li> -</ul><p>Then a new document gets allocated (HTML or XML depending on the type -ofoutput), the user parameters and global variables and parameters -areevaluated. Then <strong>xsltProcessOneNode()</strong>which implements -the1-2-3 algorithm is called on the root element of the input. Step 1/ -isimplemented by calling <strong>xsltGetTemplate()</strong>, step 2/ -isimplemented by <strong>xsltDefaultProcessOneNode()</strong>and step 3/ -isimplemented by <strong>xsltApplyOneTemplate()</strong>.</p><h3><a name="XPath" id="XPath">XPath expression compilation</a></h3><p>The XPath support is actually implemented in the libxml module (where itis -reused by the XPointer implementation). XPath is a relatively -classicexpression language. The only uncommon feature is that it is working -on XMLtrees and hence has specific syntax and types to handle them.</p><p>XPath expressions are compiled using <strong>xmlXPathCompile()</strong>.It -will take an expression string in input and generate a structurecontaining -the parsed expression tree, for example the expression:</p><pre>/doc/chapter[title='Introduction']</pre><p>will be compiled as</p><pre>Compiled Expression : 10 elements +</ul><p>Then a new document gets allocated (HTML or XML depending on the type of +output), the user parameters and global variables and parameters are +evaluated. Then <strong>xsltProcessOneNode()</strong> which implements the +1-2-3 algorithm is called on the root element of the input. Step 1/ is +implemented by calling <strong>xsltGetTemplate()</strong>, step 2/ is +implemented by <strong>xsltDefaultProcessOneNode()</strong> and step 3/ is +implemented by <strong>xsltApplyOneTemplate()</strong>.</p><h3><a name="XPath" id="XPath">XPath expression compilation</a></h3><p>The XPath support is actually implemented in the libxml module (where it +is reused by the XPointer implementation). XPath is a relatively classic +expression language. The only uncommon feature is that it is working on XML +trees and hence has specific syntax and types to handle them.</p><p>XPath expressions are compiled using <strong>xmlXPathCompile()</strong>. +It will take an expression string in input and generate a structure +containing the parsed expression tree, for example the expression:</p><pre>/doc/chapter[title='Introduction']</pre><p>will be compiled as</p><pre>Compiled Expression : 10 elements SORT COLLECT 'child' 'name' 'node' chapter COLLECT 'child' 'name' 'node' doc @@ -196,96 +196,97 @@ the parsed expression tree, for example the expression:</p><pre>/doc/chapter[tit NODE ELEM Object is a string : Introduction COLLECT 'child' 'name' 'node' title - NODE</pre><p>This can be tested using the <code>testXPath</code>command (in thelibxml -codebase) using the <code>--tree</code>option.</p><p>Again, the KISS approach is used. No optimization is done. This could bean -interesting thing to add. <a href="http://www-106.ibm.com/developerworks/library/x-xslt2/?dwzone=x?open&l=132%2ct=gr%2c+p=saxon">MichaelKay -describes</a>a lot of possible and interesting optimizations done inSaxon -which would be possible at this level. I'm unsure they would providemuch gain -since the expressions tends to be relatively simple in general andstylesheets -are still hand generated. Optimizations at the interpretationsounds likely to -be more efficient.</p><h3><a name="XPath1" id="XPath1">XPath interpretation</a></h3><p>The interpreter is implemented by -<strong>xmlXPathCompiledEval()</strong>which is the front-end to -<strong>xmlXPathCompOpEval()</strong>the functionimplementing the evaluation -of the expression tree. This evaluation followsthe KISS approach again. It's -recursive and calls<strong>xmlXPathNodeCollectAndTest()</strong>to collect -nodes set whenevaluating a <code>COLLECT</code>node.</p><p>An evaluation is done within the framework of an XPath context stored inan -<strong>xmlXPathContext</strong>structure, in the framework of -atransformation the context is maintained within the XSLT context. Its -contentfollows the requirements from the XPath specification:</p><ul><li>the current document</li> + NODE</pre><p>This can be tested using the <code>testXPath</code> command (in the +libxml codebase) using the <code>--tree</code> option.</p><p>Again, the KISS approach is used. No optimization is done. This could be +an interesting thing to add. <a href="http://www-106.ibm.com/developerworks/library/x-xslt2/?dwzone=x?open&l=132%2ct=gr%2c+p=saxon">Michael +Kay describes</a> a lot of possible and interesting optimizations done in +Saxon which would be possible at this level. I'm unsure they would provide +much gain since the expressions tends to be relatively simple in general and +stylesheets are still hand generated. Optimizations at the interpretation +sounds likely to be more efficient.</p><h3><a name="XPath1" id="XPath1">XPath interpretation</a></h3><p>The interpreter is implemented by <strong>xmlXPathCompiledEval()</strong> +which is the front-end to <strong>xmlXPathCompOpEval()</strong> the function +implementing the evaluation of the expression tree. This evaluation follows +the KISS approach again. It's recursive and calls +<strong>xmlXPathNodeCollectAndTest()</strong> to collect nodes set when +evaluating a <code>COLLECT</code> node.</p><p>An evaluation is done within the framework of an XPath context stored in +an <strong>xmlXPathContext</strong> structure, in the framework of a +transformation the context is maintained within the XSLT context. Its content +follows the requirements from the XPath specification:</p><ul><li>the current document</li> <li>the current node</li> <li>a hash table of defined variables (but not used by XSLT)</li> <li>a hash table of defined functions</li> - <li>the proximity position (the place of the node in the current - nodelist)</li> + <li>the proximity position (the place of the node in the current node + list)</li> <li>the context size (the size of the current node list)</li> - <li>the array of namespace declarations in scope (there also is a - namespacehash table but it is not used in the XSLT transformation).</li> -</ul><p>For the purpose of XSLT an <strong>extra</strong>pointer has been -addedallowing to retrieve the XSLT transformation context. When an -XPathevaluation is about to be performed, an XPath parser context is -allocatedcontaining and XPath object stack (this is actually an XPath -evaluationcontext, this is a remain of the time where there was no separate -parsing andevaluation phase in the XPath implementation). Here is an overview -of the setof contexts associated to an XPath evaluation within an -XSLTtransformation:</p><p align="center"><img src="contexts.gif" alt="The set of contexts associated " /></p><p>Clearly this is a bit too complex and confusing and should be refactoredat -the next set of binary incompatible releases of libxml. For example -thexmlXPathCtxt has a lot of unused parts and should probably be merged -withxmlXPathParserCtxt.</p><h3><a name="Descriptio" id="Descriptio">Description of XPath Objects</a></h3><p>An XPath expression manipulates XPath objects. XPath defines the -defaulttypes boolean, numbers, strings and node sets. XSLT adds the result -treefragment type which is basically an unmodifiable node set.</p><p>Implementation-wise, libxml follows again a KISS approach, -thexmlXPathObject is a structure containing a type description and the -variouspossibilities. (Using an enum could have gained some bytes.) In the -case ofnode sets (or result tree fragments), it points to a separate -xmlNodeSetobject which contains the list of pointers to the document -nodes:</p><p align="center"><img src="object.gif" alt="An Node set object pointing to " /></p><p>The <a href="http://xmlsoft.org/html/libxml-xpath.html">XPath -API</a>(andits <a href="http://xmlsoft.org/html/libxml-xpathinternals.html">'internal'part</a>) -includes a number of functions to create, copy, compare, convert orfree XPath -objects.</p><h3><a name="XPath3" id="XPath3">XPath functions</a></h3><p>All the XPath functions available to the interpreter are registered in -thefunction hash table linked from the XPath context. They all share the -samesignature:</p><pre>void xmlXPathFunc (xmlXPathParserContextPtr ctxt, int nargs);</pre><p>The first argument is the XPath interpretation context, holding -theinterpretation stack. The second argument defines the number of -objectspassed on the stack for the function to consume (last argument is on -top ofthe stack).</p><p>Basically an XPath function does the following:</p><ul><li>check <code>nargs</code>for proper handling of errors or functionswith - variable numbers of parameters</li> - <li>pop the parameters from the stack using <code>obj - =valuePop(ctxt);</code></li> + <li>the array of namespace declarations in scope (there also is a namespace + hash table but it is not used in the XSLT transformation).</li> +</ul><p>For the purpose of XSLT an <strong>extra</strong> pointer has been added +allowing to retrieve the XSLT transformation context. When an XPath +evaluation is about to be performed, an XPath parser context is allocated +containing and XPath object stack (this is actually an XPath evaluation +context, this is a remain of the time where there was no separate parsing and +evaluation phase in the XPath implementation). Here is an overview of the set +of contexts associated to an XPath evaluation within an XSLT +transformation:</p><p align="center"><img src="contexts.gif" alt="The set of contexts associated " /></p><p>Clearly this is a bit too complex and confusing and should be refactored +at the next set of binary incompatible releases of libxml. For example the +xmlXPathCtxt has a lot of unused parts and should probably be merged with +xmlXPathParserCtxt.</p><h3><a name="Descriptio" id="Descriptio">Description of XPath Objects</a></h3><p>An XPath expression manipulates XPath objects. XPath defines the default +types boolean, numbers, strings and node sets. XSLT adds the result tree +fragment type which is basically an unmodifiable node set.</p><p>Implementation-wise, libxml follows again a KISS approach, the +xmlXPathObject is a structure containing a type description and the various +possibilities. (Using an enum could have gained some bytes.) In the case of +node sets (or result tree fragments), it points to a separate xmlNodeSet +object which contains the list of pointers to the document nodes:</p><p align="center"><img src="object.gif" alt="An Node set object pointing to " /></p><p>The <a href="http://xmlsoft.org/html/libxml-xpath.html">XPath API</a> (and +its <a href="http://xmlsoft.org/html/libxml-xpathinternals.html">'internal' +part</a>) includes a number of functions to create, copy, compare, convert or +free XPath objects.</p><h3><a name="XPath3" id="XPath3">XPath functions</a></h3><p>All the XPath functions available to the interpreter are registered in the +function hash table linked from the XPath context. They all share the same +signature:</p><pre>void xmlXPathFunc (xmlXPathParserContextPtr ctxt, int nargs);</pre><p>The first argument is the XPath interpretation context, holding the +interpretation stack. The second argument defines the number of objects +passed on the stack for the function to consume (last argument is on top of +the stack).</p><p>Basically an XPath function does the following:</p><ul><li>check <code>nargs</code> for proper handling of errors or functions + with variable numbers of parameters</li> + <li>pop the parameters from the stack using <code>obj = + valuePop(ctxt);</code></li> <li>do the function specific computation</li> - <li>push the result parameter on the stack using - <code>valuePush(ctxt,res);</code></li> - <li>free up the input parameters - with<code>xmlXPathFreeObject(obj);</code></li> + <li>push the result parameter on the stack using <code>valuePush(ctxt, + res);</code></li> + <li>free up the input parameters with + <code>xmlXPathFreeObject(obj);</code></li> <li>return</li> -</ul><p>Sometime the work can be done directly by modifying in-situ the top -objecton the stack <code>ctxt->value</code>.</p><h3><a name="stack" id="stack">The XSLT variables stack frame</a></h3><p>Not to be confused with XPath object stack, this stack holds the -XSLTvariables and parameters as they are defined through the recursive calls -ofcall-template, apply-templates and default templates. This is used to -definethe scope of variables being called.</p><p>This part seems to be the most urgent attention right now, first it isdone -in a very inefficient way since the location of the variables andparameters -within the stylesheet tree is still done at run time (it reallyshould be done -statically at compile time), and I am still unsure that myunderstanding of -the template variables and parameter scope is actuallyright.</p><p>This part of the documentation is still to be written once this part ofthe -code will be stable. <span style="background-color: #FF0000">TODO</span></p><h3><a name="Extension" id="Extension">Extension support</a></h3><p>There is a separate document explaining <a href="extensions.html">how -theextension support works</a>.</p><h3><a name="Futher" id="Futher">Further reading</a></h3><p>Michael Kay wrote <a href="http://www-106.ibm.com/developerworks/library/x-xslt2/?dwzone=x?open&l=132%2ct=gr%2c+p=saxon">areally -interesting article on Saxon internals</a>and the work he did onperformance -issues. I wishes I had read it before starting libxslt design (Iwould -probably have avoided a few mistakes and progressed faster). A lot ofthe -ideas in his papers should be implemented or at least tried inlibxslt.</p><p>The <a href="http://xmlsoft.org/">libxml documentation</a>, especially <a href="http://xmlsoft.org/xmlio.html">the I/O interfaces</a>and the <a href="http://xmlsoft.org/xmlmem.html">memory management</a>.</p><h3><a name="TODOs" id="TODOs">TODOs</a></h3><p>redesign the XSLT stack frame handling. Far too much work is done -atexecution time. Similarly for the attribute value templates handling, -atleast the embedded subexpressions ought to be precompiled.</p><p>Allow output to be saved to a SAX like output (this notion of SAX like -APIfor output should be added directly to libxml).</p><p>Implement and test some of the optimization explained by Michael -Kayespecially:</p><ul><li>static slot allocation on the stack frame</li> +</ul><p>Sometime the work can be done directly by modifying in-situ the top object +on the stack <code>ctxt->value</code>.</p><h3><a name="stack" id="stack">The XSLT variables stack frame</a></h3><p>Not to be confused with XPath object stack, this stack holds the XSLT +variables and parameters as they are defined through the recursive calls of +call-template, apply-templates and default templates. This is used to define +the scope of variables being called.</p><p>This part seems to be the most urgent attention right now, first it is +done in a very inefficient way since the location of the variables and +parameters within the stylesheet tree is still done at run time (it really +should be done statically at compile time), and I am still unsure that my +understanding of the template variables and parameter scope is actually +right.</p><p>This part of the documentation is still to be written once this part of +the code will be stable. <span style="background-color: #FF0000">TODO</span></p><h3><a name="Extension" id="Extension">Extension support</a></h3><p>There is a separate document explaining <a href="extensions.html">how the +extension support works</a>.</p><h3><a name="Futher" id="Futher">Further reading</a></h3><p>Michael Kay wrote <a href="http://www-106.ibm.com/developerworks/library/x-xslt2/?dwzone=x?open&l=132%2ct=gr%2c+p=saxon">a +really interesting article on Saxon internals</a> and the work he did on +performance issues. I wishes I had read it before starting libxslt design (I +would probably have avoided a few mistakes and progressed faster). A lot of +the ideas in his papers should be implemented or at least tried in +libxslt.</p><p>The <a href="http://xmlsoft.org/">libxml documentation</a>, especially <a href="http://xmlsoft.org/xmlio.html">the I/O interfaces</a> and the <a href="http://xmlsoft.org/xmlmem.html">memory management</a>.</p><h3><a name="TODOs" id="TODOs">TODOs</a></h3><p>redesign the XSLT stack frame handling. Far too much work is done at +execution time. Similarly for the attribute value templates handling, at +least the embedded subexpressions ought to be precompiled.</p><p>Allow output to be saved to a SAX like output (this notion of SAX like API +for output should be added directly to libxml).</p><p>Implement and test some of the optimization explained by Michael Kay +especially:</p><ul><li>static slot allocation on the stack frame</li> <li>specific boolean interpretation of an XPath expression</li> <li>some of the sorting optimization</li> - <li>Lazy evaluation of location path. (this may require more changes - butsounds really interesting. XT does this too.)</li> - <li>Optimization of an expression tree (This could be done as a - completelyindependent module.)</li> -</ul><p></p><p>Error reporting, there is a lot of case where the XSLT -specificationspecify that a given construct is an error are not checked -adequately bylibxslt. Basically one should do a complete pass on the XSLT -spec again andadd all tests to the stylesheet compilation. Using the DTD -provided in theappendix and making direct checks using the libxml validation -API sounds agood idea too (though one should take care of not raising errors -forelements/attributes in different namespaces).</p><p>Double check all the places where the stylesheet compiled form might -bemodified at run time (extra removal of blanks nodes, hint on -thexsltCompMatch).</p><p></p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html> + <li>Lazy evaluation of location path. (this may require more changes but + sounds really interesting. XT does this too.)</li> + <li>Optimization of an expression tree (This could be done as a completely + independent module.)</li> +</ul><p></p><p>Error reporting, there is a lot of case where the XSLT specification +specify that a given construct is an error are not checked adequately by +libxslt. Basically one should do a complete pass on the XSLT spec again and +add all tests to the stylesheet compilation. Using the DTD provided in the +appendix and making direct checks using the libxml validation API sounds a +good idea too (though one should take care of not raising errors for +elements/attributes in different namespaces).</p><p>Double check all the places where the stylesheet compiled form might be +modified at run time (extra removal of blanks nodes, hint on the +xsltCompMatch).</p><p></p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html> |