As any code that runs with the same privileges as your web
server, there are some dangers you should be aware of and
some considerations that should be made.
In the previous sections, you have seen you can use
mod-xslt2 variables to build up hrefs to be used as the
url of the xslt to be used.
Keep in mind, however, that GET and HEADER variables
were given to you by the browser and that their values
cannot be trusted.
As an example, you could specify something like:
<modxslt-stylesheet ... href="/data/xslt/$GET[lang].xsl" ...
But what happens here if ``$GET[lang]'' contains characters
like ``../../../etc/passwd''? Using client specified variables
without any sort of validaiton is a BAD idea. Remember that
mod-xslt will run with the same privileges as your web server.
Unless you have very good reasons to do so, I would suggest
you to always use ``local://'' or ``http://'' urls when you
need to make use of untrusted variables, and specify those
variables as arguments to cgi or php scripts, which, in turn,
may verify the genuinity of untrusted variables. The
example above, could be made more ``security aware'' by
using something like:
Another solution would be to use one single stylesheet, and
verify from there the validity of variables by using standard
``xsl:...'' constructs and act accordingly.
Ok, let's say you have a .xml file that needs a .xsl file
to be fetched from a remote url in order to be correctly parsed.
Let's say you are using apache and that a browser issues
a request to get the content of that file. Now, apache would
receive the request, mod-xslt2 would be called to handle
that request and a connection be open to fetch the remote
stylesheet. Now, beshide the risks involved with letting your
customers upload ``static'' pages able to force your apache
to make outgoing connections, if, for any reason, that connection
comes back to your site, your apache is at risk: if we set
up apache to handle at most 100 requests at a time
(100 children) and we get 100 requests for that page
at the same time, mod-xslt2
will not be able to connect to the remote host to fetch
the stylesheet (no more children available), and will
wait for an apache child to become available. However,
no apache child will become available until at least
one mod-xslt2 is done, and no mod-xslt2 will be done until
at least one child becomes available or the tcp connection
expires (after a long time).
Thus, by having .xml files that use .xsl stylesheet through
http urls that in a way or another point back to the
same apache server, we are at risk. Somebody could easily
DoS us by simply running something like:
``while :; do wget http://remote.url/file.xml & done;''
Most of the other apache modules (not limited to .xml parsing
modules) out there on the internet simply don't worry about this
kind of problem.
However, mod-xslt tries as hard as it can to avoid
this kind of deadlock: any local:// request is handled
using a sub request, which doesn't use any other apache
process, exactly like http:// connections that resolve to
any of the addresses you set up apache to listen on.
Beware, however, that if you have NAT boxes or redirectors, mod-xslt2
will not be able to distinguish between local and
remote urls and won't be able to avoid this kind of deadlock.
Keep also in mind that if mod-xslt2 does not know
the ip addresses your server is listening on, it won't
be able to help you out neither. So, your OS must
support the IOCTL needed to get all the ip addresses
it is listening on or you must explicitly list them
in the apache ``Listen'' directive (instead of using ``*'').
Experimental code to autodetect this sort of condition
is under testing, but it may introduce security risks.
If you want to know more about it, please contact
one of the authors.
If I were you, I'd also try to avoid as much as possible
using variables when indicating path for:
or, at least, I'd try to explicitly specify the url scheme
to be used. For example, if I'd really need to do something
<?modxslt-stylesheet ... href="$GET[theme].xsl" ... ?>
I'd change ``$GET[theme].xsl'' either in ``file://$GET[theme].xsl'',
``http://hostname/dir/$GET[theme].xsl'' or ``local:///path/$GET[theme].xsl''.
The first one in facts is quite dangerous:
an attacker could use your server for a DoS againsta somebody
else (by specifying the http:// url of somebody else)
could specify urls to gather data about remote hosts
could specify urls to very big files and make lot of requests
and make you eat up all your bandwidth
By always specifying the url scheme and host explicitly, we would limitate
the range of action of an attacker significantly.
This is just another issue about using untrusted variables.