5. Security considerations

As any code that runs with the same privileges as your web server, there are some dangers you should be aware of and some considerations that should be made.

5.1. Variables substitution

In the previous sections, you have seen you can use mod-xslt2 variables to build up hrefs to be used as the url of the xslt to be used.

Keep in mind, however, that GET and HEADER variables were given to you by the browser and that their values cannot be trusted.

As an example, you could specify something like:

<modxslt-stylesheet ... href="/data/xslt/$GET[lang].xsl" ...
But what happens here if ``$GET[lang]'' contains characters like ``../../../etc/passwd''? Using client specified variables without any sort of validaiton is a BAD idea. Remember that mod-xslt will run with the same privileges as your web server.

Unless you have very good reasons to do so, I would suggest you to always use ``local://'' or ``http://'' urls when you need to make use of untrusted variables, and specify those variables as arguments to cgi or php scripts, which, in turn, may verify the genuinity of untrusted variables. The example above, could be made more ``security aware'' by using something like:

<modxslt-stylesheet ... 
    href="local:///getxsllang.php?lang=$GET[lang]" ...

Another solution would be to use one single stylesheet, and verify from there the validity of variables by using standard ``xsl:...'' constructs and act accordingly.

5.2. Avoiding deadlocks under heavy loads

Ok, let's say you have a .xml file that needs a .xsl file to be fetched from a remote url in order to be correctly parsed.

Let's say you are using apache and that a browser issues a request to get the content of that file. Now, apache would receive the request, mod-xslt2 would be called to handle that request and a connection be open to fetch the remote stylesheet. Now, beshide the risks involved with letting your customers upload ``static'' pages able to force your apache to make outgoing connections, if, for any reason, that connection comes back to your site, your apache is at risk: if we set up apache to handle at most 100 requests at a time (100 children) and we get 100 requests for that page at the same time, mod-xslt2 will not be able to connect to the remote host to fetch the stylesheet (no more children available), and will wait for an apache child to become available. However, no apache child will become available until at least one mod-xslt2 is done, and no mod-xslt2 will be done until at least one child becomes available or the tcp connection expires (after a long time).

Thus, by having .xml files that use .xsl stylesheet through http urls that in a way or another point back to the same apache server, we are at risk. Somebody could easily DoS us by simply running something like: ``while :; do wget http://remote.url/file.xml & done;''

Most of the other apache modules (not limited to .xml parsing modules) out there on the internet simply don't worry about this kind of problem.

However, mod-xslt tries as hard as it can to avoid this kind of deadlock: any local:// request is handled using a sub request, which doesn't use any other apache process, exactly like http:// connections that resolve to any of the addresses you set up apache to listen on.

Beware, however, that if you have NAT boxes or redirectors, mod-xslt2 will not be able to distinguish between local and remote urls and won't be able to avoid this kind of deadlock.

Keep also in mind that if mod-xslt2 does not know the ip addresses your server is listening on, it won't be able to help you out neither. So, your OS must support the IOCTL needed to get all the ip addresses it is listening on or you must explicitly list them in the apache ``Listen'' directive (instead of using ``*'').

Experimental code to autodetect this sort of condition is under testing, but it may introduce security risks. If you want to know more about it, please contact one of the authors.

5.3. Avoiding remote URLs in substitutions

If I were you, I'd also try to avoid as much as possible using variables when indicating path for:

or, at least, I'd try to explicitly specify the url scheme to be used. For example, if I'd really need to do something like:
<?modxslt-stylesheet ... href="$GET[theme].xsl" ... ?>
I'd change ``$GET[theme].xsl'' either in ``file://$GET[theme].xsl'', ``http://hostname/dir/$GET[theme].xsl'' or ``local:///path/$GET[theme].xsl''. The first one in facts is quite dangerous:

By always specifying the url scheme and host explicitly, we would limitate the range of action of an attacker significantly.

This is just another issue about using untrusted variables.