HOME
User Manual |  Mailing Lists | 
  -
»About mod-xslt
»About this site
»Credits
  -
»Join mailing list
»Mailing list archive
  -
»Releases (github)
»Old Versions
»Testing Suite
»Documentation
  -
»Browse GIT (github)
»Report Bugs (github)
  -
»User Manual
»Frequently Asked Questions
»Standards
»Wiki (github)

Users and Administrators Manual

Author:  Carlo Contavalli 
Date: 2013/04/25 23:57:16

mod-xslt2 is a web server module able to transform xml documents in any format using xslt stylesheets. It implements server side transformation of xslt stylesheet.

    | First || Previous || Next || Last |

    Although mod-xslt2 uses standard libraries to parse xml data and transform it, there are few things to keep in mind while writing xml/xslt to be parsed by mod-xslt2.

    Before anything else, you can access many mod-xslt2 parameters using the standard ``value-of'' xslt tag or standard XPath expressions.

    As an example, to put the value of the version of modxslt being used in the output generated by a stylesheet, you could use something like:
      # To output version of mod-xslt2
    <xsl:value-of select="$modxslt-version" />
    
      # To check interface version
    <xsl:if test="$modxslt-interface &lt; 1">
      mod-xslt2 interface version is greater than 1!
    </xsl:if>
    
    At time of writing, the following variables are made available to the xslt file by mod-xslt2:

    • modxslt-interface - holds the interface version being used by mod-xslt2. It is changed every time a new variable is added to this list and every time a new element is introduced (see section element extensions). Variables are not supposed to be ever removed, so you can safely assume any greater version number is backward compatible. Current version is ``2''. Unless otherwise specified, variables have been introduced in interface version 1. In version 2, the only added variable is modxslt-conf-xinclude.

    • modxslt-sapi - name of the sapi which is now parsing the xml document. It is usually set by the application making use of libmodxslt0.

    • modxslt-name - name of mod-xslt2.

    • modxslt-handler - name of handler being used by mod-xslt2.

    • modxslt-namespace - URL of the namespace for mod-xslt2 extensions.

    • modxslt-conf-libpcre - has value ``true'' if mod-xslt2 was compiled with libpcre support.

    • modxslt-conf-exslt - has value ``true'' if mod-xslt2 was compiled with exslt support.

    • modxslt-conf-xinclude - introduced in interface version 2 - has value ``true'' if mod-xslt2 was compiled with xinclude support.

    • modxslt-conf-extensions - has value ``true'' if mod-xslt2 was compiled to provide extension elements (see dedicated section)

    • modxslt-conf-libxmlthreads - has value ``true'' if libxml supports threads.

    • modxslt-conf-libxslthack - has value ``true'' if configure was given the parameter ``--enable-libxslt-hack''

    • modxslt-conf-fallbackwrap - has value ``true'' if configure was given the parameter ``--enable-fallback-wraparound''

    • modxslt-version - its value is the current version of the mod-xslt2 being used (example: "1.2.3")

    • modxslt-version-major - its value is the first digit of the version (in the example above, "1")

    • modxslt-version-minor - its value is the second digit of the version (in the example above, "2")

    • modxslt-version-patchlevel - its value is the third digit of the version (in the example above, "3")

    Variables that have value ``true'' when a feature is enabled, have value ``false'' when it is not.

    mod-xslt2 allows you to access many other variables by providing custom extension tags.

    Those extension tags are available only when you compile mod-xslt2 without ``--disable-extensions'' and if you enable them in your xsl by specifying something like:
    <xsl:stylesheet version="1.0" 
        extension-element-prefixes="yaslt"
        xmlns:yaslt="http://www.modxslt.org/ns/1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    Note the ``extension-element-prefixes'' and ``xmlns:yaslt="http://www.modxslt.org/ns/1.0" that specify that the extensions will live in the ``yaslt:'' namespace.

    However, enabling the extensions will allow you to use two more additional tags:

    • header-set - to set an output header. The only valid attribute is ``name'', you can use to specify the name of the output header you want to set.

    • value-of - to fetch a mod-xslt2 specific variable. The only valid attribute is ``select''. The value of the ``select'' attribute will be parsed as a ``mod-xslt2'' expression, which follows completely different rules than XPath expressions.

    Since those tags are provided by extensions, you need to specify the namespace every time you use them. In the example above, the namespace to use would be ``yaslt:'', as specified by the ``extension-element-prefixes'' and ``xmlns'' attributes.

    A more complete example may be the following one:
    <xsl:stylesheet version="1.0"
        extension-element-prefixes="yaslt"
        xmlns:yaslt="http://www.modxslt.org/ns/1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      <xsl:template match="faq">
        <yaslt:value-of select="$HEADER[Host]" />
    
        <yaslt:header-set name="X-Powered-by">
          <xsl:value-of select="$modxslt-version" />
        </yaslt:header-set>
        
      </xsl:template>
    
    </xsl:stylesheet>
    

    4.2.1  header-set

    header-set allows you to set a value in the http headers that will be returned back to the client. Any name is accepted, as is any value. If ``strip-space'' (<xsl:strip-space elements=...) is active for the given element, any sequence of blank characters or new lines is replaced by a single space. This feature allows you to specify multi-line headers (probably invalid for the http protocol) while keeping everything working. Note that header-set won't try to prevent you from doing stupid things. That's up to you. The only thing you won't be able to do is to set the ``Content-Type'', which is handled by some specific code.

    Anyway, ``header-set'' requires just the ``name'' attribute to be specified, to select which header you want to set. Between the opening ``header-set'' and the closing ``/header-set'', you can use any any element you want, just watch out for new lines and strange characters that may invalidate your headers.

    The following are all valid usage examples of ``header-set'':
    <xsl:stylesheet version="1.0" 
        extension-element-prefixes="yaslt"
        xmlns:yaslt="http://www.modxslt.org/ns/1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      [...]
        <yaslt:header-set name="X-Powered-by">
          <xsl:value-of select="$modxslt-version" />
        </yaslt:header-set>
    
        <yaslt:header-set name="X-Fuffa">
          fuffa
        </yaslt:header-set>
        <yaslt:header-set name="X-Foo">
          <xsl:apply-templates />
        </yaslt:header-set>
      [...]
    
    </xsl:stylesheet>
    

    value-of allows you to fetch any mod-xslt2 specific variable or expression.

    While writing mod-xslt2 code, I decided to keep most of mod-xslt2 variables in a completely independent and isolated namespace mainly for three reasons:

    • Some of the variables names and values are gathered from the running environment, which, by no means, can be considered safe or trusted.

    • For the same reason, mod-xslt2 variable names can violate XPath specifications, and I didn't want to employ weird name mangling routines.

    • Work is underway to cache xml and xslt meta data to speed things up. In order to do so, it is however necessary to intercept any access to modxslt variables.

    For the same reasons, I decided to go with my own (simple) language to parse expressions.

    An expression is a list of characters which contains one or more variables. Each variable starts with a ``$'' symbol and is followed by the name of the variable or by the name of the variable enclosed in curly brackets (``{'', ``}''). The following are all valid expressions:
    0: "this is fuffa"
    1: "$fuffa"
    2: "${fuffa}"
    3: "this is ${fuffa}"
    4: "this is much $fuffa more"
    
    Expression 0 is a ``simple'' string, no replacements are done, while expression 1 is replaced by the value of variable ``fuffa''.

    Expression 2 is exactly like expression 1, beside the fact that using ``{'' ``}'' would allow this variable to be correctly replaced even in a string like ``ababa${fuffa}ababa''. Expression 3 would be replaced by the value of variable ``$fuffa'' preceded by ``this is'', much like in expression 4.

    Non existent variables are replaced by the empty string (they are removed), while a ``$'' which should be part of the string should be escaped by preceding it with a ``\''. This also implies that any ``\'' should be itself escaped by using a double back slash (``\\'').

    The usage of ``${'' and ``}'' allows you to use indirect references: let's say that ``$foo=bar'' and that ``$bar=fuffa'', evaluating the expression ``${$foo}'' would at first be replaced by ``${bar}'' and then replaced by ``fuffa'', much like in bash programming.

    There is more to say: inside a ``{'' and ``}'' you could even put more than one variable or character constants, to ``build'' up the name of the variable you want to be replaced.

    Let's make one more example and let's say ``$fuffa=hello'', ``$foo=fuf'' and ``$bar=fa''. In this case, we would have the following output (given the expressions on the left):
    ${$foo$bar} -> ${fuffa} -> hello
    ${${foo}$bar} -> ${fuffa} -> hello
    ${${foo}fa} -> ${fuffa} -> hello
    ${fuf$bar} -> ${fuffa} -> hello
    
    Using those expressions, you could have as much fun as you want and recurse as much as you want (and your stack holds).

    The ability of parsing variables is not such a good thing if we don't have variables to parse. However, mod-xslt2 provides a rich set of predefined variables. Variables are grouped in classes that look like ``arrays''. The main arrays available are:

    • MODXSLT - holds the same variables as passed as parameters to the xslt parser (described in the previous sections). Those variables hold informations like how mod-xslt2 was compiled or what features are enabled

    • GET - contains variables passed to your xml file as GET parameters (http get)

    • HEADER - contains the headers passed to mod-xslt2 by your web server (headers that, in turn, were given by the client)

    As often happens, it is easier to explain the usage of something by showing some examples than trying to explain how it works:
    $GET[fuffa]
    $HEADER[User-Agent]
    $MODXSLT[version]
    $MODXSLT[namespace]
    
    In the examples above, the first line fetches the variable ``fuffa'' that was passed as a get parameter to the xml file (with something like http://host.fqdn/file.xml?fuffa=value).

    The second line is replaced by the header ``User-Agent'' provided by the client, while the third and fourth lines are replaced respectively by the value of the parameter $modxslt-version and $modxslt-namespace.

    Using GET, you can access any ``get'' parameter that was passed to your xml file while HEADER gives you access to any header that was sent to you by the client.

    Keep also in mind that, as explained in the previous sections, it is possible to build up the name of a variable from other variables. So, the following is also valid:
    $HEADER[$GET[header-to-check]]
    ${$source[User-Agent]}
    
    Watch out not to insert any space between the square brackets (``['', ``]'') and the array index, since they are not allowed and the variable won't be substituted.

    It is also possible to create custom variables from .xml files to be passed over to the xslt. To do so, you need to use the Processing Instruction ``modxslt-param'' with the attributes ``name'' and ``value'', where ``name'' specifies the name of the variable to be set while ``value'' specifies its value.

    Using this processing instruction, you can also override the value of predefined variables, like $GET[fuffa], but not of mod-xslt2 constants, like $MODXSLT[version].

    However, let's see a complete example of using modxslt-param:
    <?xml version="1.0" encoding="ISO-8859-1" 
    	standalone="yes"?>
    
    <!DOCTYPE fuffa SYSTEM "dtd/fuffa.dtd">
    
    <?modxslt-param name="variable" value="its value" ?>
    
    [...]
    
    The variable ``$variable'' would thus be accessible from your xslt using modxslt own ``value-of'', with something like ``...:value-of select="$variable"''.

    Note also that some SAPI allow you to pass over parameters to your xml or xsl files from configuration files. Refer to the SAPI specific section of this manual.

    Before using any of the mod-xslt2 xslt extensions described in the previous sections, you should make sure they are available on your system and version of mod-xslt2.

    As a first test, you can verify your xslt is being used by mod-xslt2 by checking the value of the XPath param ``modxslt-interface''. If available, you can then verify that ``modxslt-conf-extensions'' has value true, and assume extensions are available.

    One alternative to using mod-xslt2 params is to use the standard functions defined in http://www.w3.org/TR/1999/REC-xslt-19991116#extensions:
    boolean element-available(string)
    boolean function-available(string)
    
    to verify mod-xslt2 extension tags are available, with something like:
    <xsl:if test="element-available('yaslt:value-of')">
      <yaslt:value-of select="$HEADER[User-Agent]" />
    </xsl:if>
    
    assuming that in the opening ``xsl:stylesheet'' tag you specified as extension name ``yaslt''.

    There's one more way to handle the availability of mod-xslt2 extensions: using the ``xsl:fallback'', like in this example:
      <yaslt:value-of select="$HEADER[User-Agent]">
        <xsl:fallback>
          Sorry, cannot access headers 
           (no mod-xslt2 extensions available)
        </xsl:fallback>
      </yaslt:value-of>
    
    In this case, if ``yaslt:value-of'' is not available the xslt processor will parse the content of the ``xsl:fallback'' node. In any other case, the ``xsl:fallback'' node will be completely ignored.

    However, at time of writing, libxslt-1.0.32 has a few bugs handling fallback nodes:

    • when the extension is available, the fallback node is not ignored as ought to be and a warning is printed in your web server logs (read the FAQ for more information about this issue)

    • when the warning is printed, libxslt-1.0.32 calls the error handler with the wrong parameters, possibly causing a segmentation fault (it happens if libxslt debugging was enabled, in which case one of the pointer passed to mod-xslt2 is not null and considered to be valid)

    There are a few ways to avoid this problem:
    • patch the library in order to avoid the warning to be printed

    • patch the library in order for the correct arguments to be always passed to error functions

    • compile mod-xslt2 specifying ``--enable-fallback-wraparound'', to ask mod-xslt2 to remove ``fallback'' nodes when the extensions are available

    • compile mod-xslt2 specifying ``--enable-libxslt-hack'', to ask mod-xslt2 to always enable debugging using some wrappers, in order to avoid the error highlighted above

    Those are really two libxslt bugs, where the first one triggers the second one. Correcting one of the two should be enough for mod-xslt2 to work. However, the best way to solve the problem is by using the two highlighted patches. Please read ``README.Patches'' to know more about those issues.

    As indicated in the previous sections, you are not allowed to use ``header-set'' to try to change the ``Content-Type'' of the document.

    However, you can choose the mime type of the document by specifying the attribute ``media-type'' in the ``xsl:output'' tag, as shown in the example below:
    <xsl:output media-type="text/html" 
    	encoding="ISO-8859-1" />
    
    If no media-type is specified, mod-xslt2 will try to guess it (relying on libxml2 parsing), probably returning back to the client a media-type of ``text/plain'', ``text/xml'' or ``text/html''.

    Keep in mind that you can specify any ``media-type'' you want.

    mod-xslt2 decides which stylesheet to use thanks to a 3 steps procedure:

    • if any suitable <?xml-stylesheet or <?modxslt-stylesheet processing instruction is found, the specified stylesheet is used, regardless of any XSLTSetStylesheet parameter. The first suitable xml-stylesheet or modxslt-stylesheet is chosen.

    • if a XSLTSetStylesheet is provided for the given document type, the specified stylesheet is used.

    • in any other case, a 500 server error is returned (the rationale is that once mod-xslt2 is told to parse the document, mod-xslt2 will either output the parsed document or send an error back to the client).

    We already talked about XSLTSetStylesheet in the SAPI specific sections, and there's not much more to say here.

    However, as you can see in the first step, mod-xslt2 can be told which stylesheet to use using two different processing instructions: xml-stylesheet and modxslt-stylesheet.

    xml-stylesheet is a standard xml directive, and its description and usage can be found on http://www.w3.org/TR/xml-stylesheet. As you probably already know, xml-stylesheet can be used to associate a given xml file with the xslt to be used and generally looks something like:
    <?xml-stylesheet 
    	type="text/xsl" 
    	href="./xslt/faq-http.xsl" 
    	media="screen" 
    	alternate="no" 
    	title="For any web browser" 
    	charset="ISO-8859-1"?>
    
    For a given stylesheet to be considered ``suitable'' by mod-xslt2 to parse an xml file, the following conditions must be met:

    • type - must be either ``text/xml'' or ``text/xsl''.

    • href - should contain the url of the stylesheet. See the next section on ``Accepted urls'' .

    • media - either ``all'' or ``screen''. In case of ``screen'', it can be followed by one or more ``media expressions''. The stylesheet will be considered ``suitable'' only if one of the expressions is evaluated to have a ``true'' value. The ``empty'' expression is considered to always match (be true). See the section on ``Media expressions'' .

    At time of writing, the other fields are of no interest to mod-xslt2.

    The main difference between ``xml-stylesheet'' and ``modxslt-stylesheet'' is that the first one should be used in a way compliant with the highlighted standards, while the second one may use any mod-xslt2 extension.

    Keep in mind, however, that mod-xslt2 will not complain if you use its extensions in the standard ``xml-stylesheet'' tag.

    modxslt-stylesheet is a processing instructions that takes exactly the same arguments as xml-stylesheet, and it has been introduced mainly for three reasons:

    • As not being standardized, it is interpreted only (and exclusively) by mod-xslt2. If a browser or another xslt processor finds any modxslt-stylesheet directive, it will completely ignore it. This allows few useful tricks that will be discussed in the next few sections.

    • currently, the standard says that the ``media'' attribute ``may'' contain expressions, which are not yet defined in any standard and that may be defined in the future (AFAIK). Browsers or xslt-processors are thus supposed to ignore them and skip anything after the first ``word'' followed by a space until the first comma. However, mod-xslt2 supports its own language for expressions, language that may be used in ``modxslt-stylesheet'' without fears to conflict with future standards or incompatible browsers.

    • as you probably know, in case a stylesheet is not found or is not valid, mod-xslt2 will give the user back a ``server error'' (the rationale is that if a document was meant to be parsed the inability to do so is an error). However, the ``modxslt-stylesheet'' pi provides a simple mechanism to return back to the browser the plain xml in order to let the browser parse it. This feature, combined with ``media expressions'', would allow you to configure mod-xslt2 to parse only those xml files that would cause problems to the various browsers on the market.

    A couple examples will be given in the following sections.

    As you probably know, the ``media'' attribute in a xml-stylesheet (or modxslt-stylesheet) processing instructions may contain a comma separated list of ``media types'' for which the stylesheet should be used to parse the xml data.

    A media attribute usually looks something like:
    <xml-stylesheet ... media="screen, printer"...
    
    where the media ``all'' is a sort of wildcard, specifying that a stylesheet could be used to parse xml data to be outputted on any media.

    Any xslt processor should also ignore anything from the first space following the name of a media type to the first comma (or end of attribute), since ``in the future'' some kind of ``expressions'' may be introduced by the standards.

    Any xslt processor looking for a ``screen'' media stylesheet, should thus accept anything like
    <xml-stylesheet media="screen fuffa fuffa, printer"
    or
    <xml-stylesheet media="screen foo bar"
    
    where ``fuffa fuffa'' or ``foo bar'' are ``Media expressions'' that should be ignored.

    mod-xslt2 introduces its own language to specify media expressions.

    A mod-xslt2 expression is usually a boolean expression preceded by an ``and'' made of a list of tests that must be passed for the stylesheet to be used. Tests may make use of mod-xslt2 variables and of many operators that will be described in the following sections.

    Mod-xslt grammar to parse expressions could be described by a BNF similar to the following:
    bool_expr: 
    	| cmp_expr 
    	| cmp_expr ','
    ;
    
    cmp_expr: '(' cmp_expr ')' 
    	| '!' cmp_expr 
    	| cmp_expr BooleanOperator cmp_expr 
    	| String StringOperator String
    	| String
    ;
    
    Where capitalized strings are terminals and lowercase strings are non-terminals.

    Associativity of operators and their precedence will be shown in the next sections.

    4.4.1.1.1  Strings

    A string can be specified by using the name of a variable, a sequence of characters that match the regular expression:
    [^[:blank:]*+%/\"\',!><=~()-]*
    
    or a sequence of any character enclosed in single or double quotes (', "), where any ``''' or ``"'' part of the sequence itself must be escaped by prepending it with a ``\'', while a ``\'' with no special meaning should be escaped with another slash.

    Strings enclosed in single or double quotes may also contain ``mod-xslt2 expressions'', as described in the section ``value-of - mod-xslt2 expressions'' and contain any of the specified variables.

    4.4.1.1.2  String Evaluation

    As shown in the BNF grammar, a String can be used both in a ``boolean'' or ``cmp'' context (either, checked with a BooleanOperator or StringOperator). In boolean context, a String is considered true if it does not correspond to the empty string ("") or if the variable is defined (has a value associated with it, regardless of what the value is).

    4.4.1.1.3  Boolean Operators

    mod-xslt2 recognizes the following left associative boolean operators:

    • 1 - ``!'' - logical negation (left associative)

    • 2 - ``and'' - logical and (left associative)

    • 2 - ``or'' - logical or (left associative)

    Where the precedence of the operators is determined by the number (lower the number, higher the precedence).

    4.4.1.1.4  String Operators

    mod-xslt2 recognizes the following string operators:

    • ``=='' or ``='' - equal, true if the left side of the operator is equal to the right side. At time of writing, ``='' and ``=='' have the same meaning and return a true value if the memory representation of the string on the left is the same of that on the right. In the future, ``=='' will maintain this meaning, while ``='' will be used to compare the value of the number on the left with that of the number on the right (taking care of roundings and of processor precision limits).

    • ``!='' - unequal, true if the left side of the operator is not equal to the right side. By equal we mean that the memory representation of the string on the left is the same as that on the right.

    • ``=~'' - perl regular expression matches, true if the regular expression on the right of the operator matches the string on the left. See next section on regular expressions for more details.

    • ``!~'' - perl regular expression does not match, true if the regular expression on the right of the operator does not match the string on the left. See next section on regular expressions for more details.

    • ``>'', ``>='', ``<'', ``<='' - true respectively when the string on the left of the operator, converted to a ``real'', is greater, greater or equal, less, less or equal, to the value of the string on the right converted to a ``real''.

    The operators ``=~'' and ``!~'' allow you to match a String with a perl compatible regular expressions, also known as PCRE.

    mod-xslt2 makes use of ``libpcre'' to parse and apply those expressions. The complete reference of those regular expressions can thus be found in perlre(1) on any unix system with perl installed, while a quick tutor and introduction can be found on perlretut(1) or perlquick(1) and on any book about perl programming.

    In mod-xslt2, PCRE are specified by enclosing them in a ``separator'', and by indicating one or more options after the second occurrence of the separator. A separator may be any character beside ``\'', which can be used to escape the separator itself if needed in the regular expression. Additionally, regular expressions may be enclosed in single or double quotes, to overcome the limits of characters a String can contain as specified in the previous sections.

    Keep also in mind that, as being specified in a xml attribute, any entity must also be escaped using standard xml notation.

    In any case, the following options may be specified:

    • i - using this option, case insensitive matching is performed

    • e - using this option, the ``$'' matches only the end of the string and does not match any newline the string may contain

    • a - using this option, the match must be ``anchored'', which means it must match from the beginning to the end of the string

    • s - using this option, the ``.'' matches also newlines

    • x - using this option, spaces inside a regular expression are ignored unless they are escaped

    • X - using this option, you will enable libpcre features not compatible with perl. At time of writing, enabling this option will cause an error every time an unknown character is escaped (prepended with a ``\'')

    • m - using this option, you will enable multiline matching

    • y - this option ``inverts the greediness of the quantifiers, so that they are not greedy by default, bug become greedy if followed by ?'' (from pcreapi(3))

    • u - this option causes PCRE to consider both the pattern and the string as made of UTF-8 encoded characters

    The following are all examples of valid regular expressions:
    '/fuffa/i' - match ``fuffa'', ``Fuffa'', ``FUFFA'',
    	``abfuffabc''...
    '$bap$ia' - match ``bap'', ``Bap'', ...
    '$a\\\$$i' - match any string containing ``a$''. In this
    	case, ``$'' is escaped twice to avoid it being
    	considered ``the terminator'' and to avoid any
    	meaning as regular expression special character.
    '&amp;fuffa&amp;i' - exactly like the first example,
    	but using ``&'' as separator
    
    Note that I have always enclosed them in quotes, to avoid causing problems to the parser.

    <modxslt-stylesheet ... media="fuffa" ... >
    
    will return false, and the stylesheet not applied.
    <modxslt-stylesheet ... media="all" ... >
    
    will return true, and the stylesheet applied.
    <modxslt-stylesheet ... media="screen" ... >
    
    will return true, and the stylesheet applied.
    <modxslt-stylesheet ... media="screen and 
      '$HEADER[User-Agent]' =~ '/msie/i'" ... >
    
    will return true only if the header ``User-Agent'' contains the string ``msie'' compared using case insensitive matching.
    <modxslt-stylesheet ... 
      media="screen and 
      	  '$HEADER[User-Agent]' =~ '/msie/i' or 
              '$HEADER[User-Agent]' =~ '/Moz.*1\.0/'" 
    	  ... >
    
    will return true if the header ``User-Agent'' contains the string ``msie'' compared using case insensitive matching or contains the string ``Moz'' followed by any number of any character as long as it is followed by the string ``1.0''.
    <modxslt-stylesheet ... 
      media="fuffa, screen and $GET[ignorebrowser] or 
              '$HEADER[User-Agent]' =~ '/Moz.*1\.0/'" 
    	  ... >
    
    will return true when the xml page was called by a browser with a get parameter ``ignorebrowser'' with any value (with something like http://url.of.xml.document.org/path/to/xml/document.xml?ignorebrowser=1) or if the header ``User-Agent'' contains the string ``Moz'' followed by ``1.0''.
    <modxslt-stylesheet ... 
      media="fuffa, screen and 
               $MODXSLT[interface] >= 1 and
      	   $MODXSLT[sapi] = apache1"
    
    will return true when the interface version of mod-xslt2 is greater than or equal to 1 and when the sapi which is parsing the xml document is ``apache1''. This is especially useful if your xsl stylesheet rely on some server specific variable/feature being available.

    Keep also in mind that you can use as many ``moxslt-stylesheet'' or ``xml-stylesheet'' as you want. In any case, the first matching one will be used by mod-xslt2. If none applyable will be found, a 500 error page will be returned. The rationale is that if you want a document to be parsed (and configure mod-xslt2 to do the parsing), the document will be either be parsed and returned back to the browser or an error given back without disclosing of any additional information.

    4.4.1.2  href URLs

    Well, up to now we have seen how our .xml files can specify which xslt to be used. However, we haven't seen where the .xsl files can be stored and how we can specify their path.

    At time of writing, mod-xslt2 (thanks to libxml2), can fetch xsl, dtd or other .xml files using one of the following methods:

    • remote http URL

    • remote ftp URL

    • local file URL

    • local http URL

    The first three methods are quite standard and made available thanks to libxml2 handlers. However, there are a few things to keep in mind:
    • a http URL must start with ``http://'' and could contain a port number, an username and a password, like in http://carlo%password@www.masobit.net:7568/file.php.

    • a ftp URL must start with ``ftp://'' and follow the same conventions as the http URL shown above.

    • a file URL may start with ``file://'' or just be an absolute or relative path. If it starts with ``file://'', ``file://'' is removed from the URL and the remaining path is opened. Thus, something like ``file:///file.xsl'' points to ``file.xsl'' in the ``root'' (/) of the file system, while ``file://file.xsl'' indicates ``file.xsl'' in the same directory of the xml file referring to that url. Both could been given using a path like ``/file.xsl'' or ``file.xsl'', without any preceding ``file://'' (which is stripped anyway).

      In short, it should follow the same syntax and behavior as indicated on rfc 1738, 1808 and 2396.

    There is a fourth url scheme supported by mod-xslt2: local://.

    local:// behaves exactly like a local file URL, but tells mod-xslt2 that the file should be fetched using the http protocol. This scheme allows you to easily use .xsl or dtd generated by cgi or php scripts, without using a remote connection.

    As you may notice, there are thus at least three good reasons to use ``local://'' instead of ``http://'':

    • Before anything else, ``local://'' is like the ``file://'' scheme, and allows you to easily specify urls relative to the path of the file using the url itself. For example, if you fetch http://www.masobit.net/fuffa/doc.xml and doc.xml requires ``local://xslt/doc.xsl'', the file ``http://www.masobit.net/fuffa/xslt/doc.xsl'' will be fetched using the http protocol (and thus, allowing doc.xsl to be a php script or cgi-bin).

      In contrast, ``local:///xslt/doc.xsl'' would refer to ``http://www.masobit.net/xslt/doc.xsl'', much like ``file://''.

    • In second instance, if you use virtual domains and you want to make use of locally generated xsl files, you cannot reliably use the ``http://'' scheme. As you may guess, in ``virtual'' environments ``http://localhost/xslt/doc.xsl'' would not be necessarily the same as ``http://www.masobit.net/xslt/doc.xsl'', and there would be no way to specify a relative url if ``local://'' was not available.

    • Instead of using an outgoing connection, when a ``local://'' url is specified a ``subrequest'' is made, allowing

      • faster retrieval of generated documents

      • easy detection of loops (more about loops will be discussed later)

      • to avoid a deadlock that may halt mod-xslt2 (and that halts any mod-xslt2 I have seen on the internet) under very high loads (which a malicious user may use to cause a denial of service)

    4.4.1.3  HTTP glinces

    If you use http urls, either for external DTDs or for xsl stylesheets, watch out for a small problem it may arise: libxml2, by default, when fetching a remote document, does not check for the http status of the web server. The result is that, libxml2, will try to parse everything that will be returned back to the browser, even a 404 or 500 error page.

    Some believe this behavior is correct, others believe error pages should be considered exactly like a file system error. However, independently from what I think, this behavior leads to strange results when parsing documents: for example, if a DTD is missing on the local file system, the problem is ignored, but if it is missing from a remote url, the 404 page is parsed as the looked up DTD, and a fatal error is produced, telling the DTD is invalid. Additionally, in your error log you would see wierd errors telling you about invalid lines in DTDs you know nothing about and you cannot find anywhere in the file system.

    So, beware, if you find strange errors regarding DTDs or xslt you never wrote, verify they are not the 404 or 500 error pages returned back by a remote server.

    There's one more thing to say: I personally don't like this behavior. I tryed to get rid of it in several ways, but the only way out I found was either rewrite the http client (or include my own with mod-xslt2) or patch libxml2 library.

    The first solution didn't seem quite realistic to me, so, in the ``/patches'' directory of the mod-xslt2 tarball, you'll find two patches:

    • The first one makes libxml2 verify the error status of the remote web server, and return an error if it is not a 200 or 3xx (in which case the page the client has been redirected to will be fetched)

    • The second one makes libxml2 export some private data allowing anybody making use of it to freely choose what to do in case of errors

    Personally, I believe the first patch may break things up. On your system, you may have applications that rely on libxml2 parsing error pages (I personally believe those applications should be considered broken anyway). The second patch, instead, should not break anything.

    Additionally, any URL specified in a ``href'' attribute or the value of the ``type'' and ``media'' attribute in a <xml-stylesheet or <modxslt-stylesheet may contain any mod-xslt2 expression , that will be replaced the first time the expression itself is used.

    As an example, you may specify something like:
    <?xml-stylesheet type="text/xsl" 
      href="http://www.mbit.net/xslt/faq.php?
        lang=$GET[lang]&agent=$HEADER[User-Agent]"
    
    This is useful to pass over parameters to php or cgi scripts. As you may already have noted, any generated xsl stylesheet is able to access to mod-xslt2 variables, but the cgi or php script itself does not have any access to them, unless you pass them over as get parameters.

    Right now, POST requests are not supported and will never be unless somebody decides it worth and either works on it or bagges me to do so.

    Note also that URL should be correctly encoded by replacing xml entities and by replacing any dangerous character with the corresponding ``%'' value.

    Using the processing instruction ``xml-stylesheet'' to select a stylesheet to be used, you must specify the ``href'' of the stylesheet to be used. Failure of doing so will result in an error be outputted in your logs. However, if you use ``modxslt-stylesheet'', you can omit the ``href'' of the stylesheet to be used. In this case, the xml will be returned back to the browser without further processing.

    This is quite useful if you know a particular browser is able to correctly parse your xml files: in this case, you can use a ``media expression'' to match the browser, and specify no href for the xml to be returned back raw to the client, with something like:
    [...]
    <modxslt-stylesheet 
      type="text/xsl"
      media="screen and 
        '$HEADER[User-Agent]' =~ '@Gecko.*1\.4@'"
      alternate="no" title="For Mozilla web browser" 
      charset="ISO-8859-1" ?>
    
    <modxslt-stylesheet 
      type="text/xsl"
      href="local://xslt/links.xsl"
      media="screen and 
        '$HEADER[User-Agent]' =~ '@Links@'"
      alternate="no" title="For Links web browser" 
      charset="ISO-8859-1" ?>
    
    <xml-stylesheet
      type="text/xsl"
      href="local://xslt/any.xsl"
      media="screen"
      alternate="no" title="For any other web browser"
      charset="ISO-8859-1" ?>
    [...]
    
    In the example above, the xml file would be returned raw if the request was made by ``mozilla'', would be parsed using ``xslt/links.xsl'' if the request was made by ``Links'' while it would be parsed using ``any.xsl'' if the request was made by any other web browser.

    If ``mozilla'' was used, the raw document would be then parsed by mozilla itself, that would ignore any ``modxslt-stylesheet'' and use as a stylesheet ``local://any.xsl''. However, mozilla itself wouldn't understand ``local://'' urls and return an error. Thus, in any ``raw'' document returned by mod-xslt2, ``local://'' urls are replaced by standard ``http://'' urls pointing back to the virtual domain that was used to issue the request, allowing mozilla to parse the document without problems.

    Additionally, any mod-xslt2 variable used in <xml-stylesheet processing instruction is replaced before returning back the raw xml document to the browser.

    This feature will be especially useful as more browsers will correctly support xml and xsl transformations.

    Note, however, that variables used in <modxslt-stylesheet pi are not replaced: the main reason not to parse them is that expressions may loose their meaning if variables are replaced (... screen and 'Mozilla/5.0 Gecho/20031010 Debian/1.4-6' =~ '@Gecko/.*1\.4@' ??, doesn't seem too smart), and that a browser is able to understand those PI it should be given all the needed informations to perform expression evaluation by itself.

    Some believe DTDs are useless to parse xml documents or to generate http output. However, as you probably know, DTDs may be used to provide defaults for certain tags or provide the definition of entities used by your xml document.

    However, DTDs are not always useful and parsing one additional document may be not a bearable overhead.

    The approach used by mod-xslt2 is to parse externa DTDs only if the document is declared not to be standalone. Thus, the standalone attribute in your xml declaration is not ignored:
    <?xml version="1.0" encoding="ISO-8859-1" 
    	standalone="yes" ?>
    
    tells mod-xslt2 not to load external DTDs, while:
    <?xml version="1.0" encoding="ISO-8859-1" 
    	standalone="no" ?>
    
    tells mod-xslt2 your xml file needs them. Even if you tell mod-xslt2 to make use of DTDs, just an error will be printed if they are missing (unless they are fetched using the http protocol, look to the section ``HTTP Glinces'').

    Before putting a xml page or newly created php script on your web server you may want to test the generated output statically on your local machine.

    Since it may not always be possible to use a web server to verify them, you may be interested in some command line utilities you may found useful.

    4.6.1  xsltproc

    xsltproc is a tool provided in the libxslt package which can be used to process xml files from the command line. As being provided by libxslt, however, it does not support mod-xslt2 extensions and uses different error handling routines.

    To use it, you just need to type ``xsltproc file_to_parse.xml''. The output will be printed on stdout. In case of errors, they will be printed on stderr.

    xsltproc may be useful mainly for two purposes: you can see what a standard browser (that does not support mod-xslt2 extensions) would do with your xml documents, and you can profile the parsing times. xsltproc provides the ``--timing'' parameter which allows you to know which xslt instructions required the greatest amount of time to generate the output document, with something like ``xsltproc --timing file_to_parse.xml''.

    modxslt-parse looks quite similare to xsltproc. The main difference is that modxslt-parse supports all mod-xslt2 extensions and that does not support any command line parameter, beside the name of the file to parse (output is sent to standard output). It is quite useful to verify a particular .xml file off-line from your command line. You can use it by simply typing something like:
    modxslt-parse file.xml
    
    from your command line. Output will be sent to stdout, while errors to stderr. Headers set will be discarded. It internally uses exactly the same engine as mod-xslt2.

    4.6.3  rxp

    ``rxp'' is a tool provided in the rxp package on http://www.cogsci.ed.ac.uk/~richard/rxp.html.

    It can be used to verify validity of xml files or well-formedness. It should be used to verify the output of your scripts or the validity of your xml files before putting them on line.

    Since strerror is not thread safe on many systems, it cannot be used to translate ``errno'' error codes in to more readeable (for human beings) strings.

    If you see strange ``errno: x'' error codes in your logs, just use something like:
    modxslt-perror x
    
    to know which error verifyed during parsing. The value of ``x'' is really system dependent, so, I cannot tell you beforehand what the ``x'' errno error means on your systme, unless you run modxslt-perror (which asks your operative system what it means).

    modxslt-config can be used to query mod-xslt2 configure and installation parameters. It is usually useful only if you are encountering problems in using mod-xslt2 (problems like inability to load libraries, linking failures...) or if you are writing code for mod-xslt2 (to know the build parameters to be used).

    It can also be used to verify if mod-xslt2 is installed on the system.

    | First || Previous || Next || Last |
      -
    »Debian GNU
    »Apache Project
    »Gnome libxml2
    »Gnome libxslt
    »Gnome Project
    Introduction
    Installation
    mod-xslt2 Setup and Usage
    Writing XML for mod-xslt2
    Security considerations
    Reporting BUGS / Helping out the project
    License, copyright and...