Transforming XML with PHP and XSL

By

If you want to transform XML from one format to another, and especially if either the input or output XML is complicated or the transformation itself is difficult or awkward to express, then XSL may be a good choice. XSL is the eXtensible Stylesheet Language; a family of three W3C recommendations to do with the transformation and presentation of XML documents. This article will walk through some examples of how XSL and PHP can be used to achieve these types of XML tranformations.

What XSL looks like

Here is how to create 'Hello, World' in PHP and XSL. This is not the simplest version possible, but it illustrates how it works and how it integrates with PHP.

The input XML (hello.xml): <root name="World"/>

The XSL stylesheet (hello.xsl):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:template match="/">
<greeting>
Hello, <xsl:value-of select="/root/@name"/>
</greeting>
</xsl:template>
 
</xsl:stylesheet>

The PHP script (hello.php):

<?php
 
$xml = new DOMDocument();
$xml->load('hello.xml');
 
$xsl = new DOMDocument;
$xsl->load('hello.xsl');
 
$proc = new XSLTProcessor();
$proc->importStyleSheet($xsl);
 
echo $proc->transformToXML($xml);

To run it:

$ php hello.php
<?xml version="1.0"?>
<greeting>
Hello, World</greeting>

Now, if you're counting, you may have noticed that this code spans three files and more than a few lines of both PHP and XSL. This is not the sort of thing you what you want to see in a 'Hello, World'! There are parts that seem superfluous, however they are in fact mandatory (e.g. "xmlns:xsl="http://www.w3.org/1999/XSL/Transform") and, if you look carefully, it doesn't even produce exactly the result we want; instead of the closing appearing on a line by itself, it appears on the line next to the Hello, World. (Generating the exact whitespace you want is annoyingly difficult with XSL.)

A partial explanation for this complexity is that this is the merging of two languages, and attempting this is likely to be messy (The XSL support in Perl, Python and Ruby is not significantly better.) PHP+XSL is an ugly thing, completely unsuited to one-liners, and I'm not going to try to persuade you otherwise! XSL comes into its own when applied to hard problems and, as I hope this article will demonstrate, XSL and PHP can be a very powerful combination.

The three parts to XS

There are three W3C specifications related to XSL:

  • XSLT – an XML-based language for transforming and manipulating XML. XSLT is a complete programming language with conditionals, loops, functions, and so on. Since it is XML-based, any XML 'template' code that is intended to become part of the output XML does not need to be escaped or otherwise transformed; literal XML is fine. Though this small saving is welcome ('template' code commonly makes up half the characters in the file, and it's not unusual for them to make up 90%), XSLT is otherwise rather verbose.
  • XPath – a language for selecting (or picking out) nodes from an XML document. XPath is used within XSLT in much the same way that SQL might be used within PHP code. (An XSLT expression returns sets of nodes in much the same way that an SQL select statement returns sets of rows.) An XPath expression (which is a string, not XML) is typically a one-liner. XPath is both very powerful and very concise (arguably, too concise.)
  • XSL-FO – a souped-up CSS with XML syntax for defining the appearance of an arbitrary XML document. i.e. it can style any XML document, not just (X)HTML. Because PHP's XSL extension doesn't support XSL-FO and (since you're reading this on a PHP tutorials site you're probably styling your HTML or XHTML with CSS anyway) we'll ignore XSL-FO for the remainder of this post.

When to use XSL?

So, when should you use a language that seems too verbose in some areas (XSLT), too concise in others (XPath), and an uncomfortable fit with PHP?

As I mentioned above, XSL is not a good fit if you have a relatively simple XML transformation problem. For example, XSL is probably not a good choice if you are dealing with an XML document that is more or less a straightforward XML serialisation of a database result set or similar, and the XML you need to produce is likewise relatively uncomplicated.

If there aren't many nested elements and there is no potential for recursive structures, you will probably be better off with a simpler solution such as simplexml, which maps XML elements to PHP objects such that an element's children are accessible as object properties. Although there are still some advantages to using XSL (or at least XPath) in this situation, unless you have these skills already then the investment of time needed is probably too large.

Examples

As with most technical examples, the challenge is to find an application short enough to be understandable and complex enough to be useful. To overcome this I have included a series of examples of PHP and XSL usage, laid out in approximately increasing order of difficulty and sophistication.

Please note that the examples do not attempt to explain XSLT or XPath; to learn more about them I recommend Chapter 15 of Elliotte Rusty Harold's XML 1.1 Bible--a good narrative introduction

Extract information from a difficult-to-manipulate XML file

Apple software commonly reads or writes XML files in the 'PLIST' format. For example, part of my iTunes Music Library.xml looks like this:

<dict>
 
  <key>Track ID</key>
  <integer>17828</integer>
 
  <key>Name</key>
  <string>The Girl You Lost to Cocaine</string>
 
  <key>Artist</key>
  <string>Sia</string>
 
  <key>Album</key>
  <string>Some People Have Real Problems</string>
 
</dict>

This format, in which keys and values are siblings, is somewhat difficult to work with: to figure out which album the track 'The Girl You Lost to Cocaine' appears on, for example, you need to look for an element with the value 'The Girl You Lost to Cocaine' that immediately follows a element with the value 'Name', then find a sibling 'Album' key, and then its corresponding value. Nevertheless, this can be done via one long XPath expression that picks out the correct node, and a small amount of XSL:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<!-- XPath expression to return album given track from "iTunes Music Library.xml" -->
 
<xsl:output omit-xml-declaration="yes"/>
 
<xsl:template match="/">
<xsl:value-of select="//key[text() = 'Name'][following-sibling::*[1][text() = 'The Girl You Lost to Cocaine']]/../key[text() = 'Album']/following-sibling::*[1]"/>
</xsl:template>
 
</xsl:stylesheet>

As before, the PHP script to utilise this would look something like this:

<?php
 
$xml = new DOMDocument();
$xml->load(sprintf("%s/Music/iTunes/iTunes Music Library.xml", getenv("HOME")));
 
$xsl = new DOMDocument;
$xsl->load('itunes.xsl');
 
$proc = new XSLTProcessor();
$proc->importStyleSheet($xsl);
 
echo $proc->transformToXML($xml);

Almost all of the work here is done by the XPath expression that runs across the page. It's not terribly human readable, but I do think it is compact for what it does. Note that there is no loop used to achieve this!

One useful trick this XSL stylesheet employs is the <xsl:output omit-xml-declaration="yes"/> declaration. This prevents the <?xml version="1.0"?> header line from being output.

Rewriting URLS

In many situations it's useful to transform URLs embedded within an XHTML document in some way. For example, you might have an XHTML fragment which includes the string <a href="isbn:014101900X">, and you want to transform this to <a href="http://www.amazon.co.uk/s?search-alias=stripbooks&field-isbn=014101900X">.

To achieve the transformation shown above will involve a bit more XSL than the previous examples. If the input XML is:

<body>
<p>Hello, I would like to recommend <a href="isbn:014101900X">this book</a> to you.</p>
</body>

And the XSL is:

HOME BLOG TRANSFORMING XML WITH PHP AND XSL
PREVIOUS POST
NEXT POST
TRANSFORMING XML WITH PHP AND XSL
December 16, 2009 by Michael Stillwell.
TUTORIALS Transforming XML with PHP and XSL
If you want to transform XML from one format to another, and especially if either the input or output XML is complicated or the transformation itself is difficult or awkward to express, then XSL may be a good choice. XSL is the eXtensible Stylesheet Language; a family of three W3C recommendations to do with the transformation and presentation of XML documents. This article will walk through some examples of how XSL and PHP can be used to achieve these types of XML tranformations.
WHAT XSL LOOKS LIKE
Here is how to create "Hello, World" in PHP and XSL. This is not the simplest version possible, but it illustrates how it works and how it integrates with PHP.
The input XML (hello.xml):
<root name="World"/>
The XSL stylesheet (hello.xsl):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:template match="/">
<greeting>
Hello, <xsl:value-of select="/root/@name"/>
</greeting>
</xsl:template>
 
</xsl:stylesheet>
The PHP script (hello.php):
<?php
 
$xml = new DOMDocument();
$xml->load('hello.xml');
 
$xsl = new DOMDocument;
$xsl->load('hello.xsl');
 
$proc = new XSLTProcessor();
$proc->importStyleSheet($xsl);
 
echo $proc->transformToXML($xml);
To run it:
$ php hello.php
<?xml version="1.0"?>
<greeting>
Hello, World</greeting>
Now, if you're counting, you may have noticed that this code spans three files and more than a few lines of both PHP and XSL. This is not the sort of thing you what you want to see in a "Hello, World"! There are parts that seem superfluous, however they are in fact mandatory (e.g. "xmlns:xsl="http://www.w3.org/1999/XSL/Transform") and, if you look carefully, it doesn't even produce exactly the result we want; instead of the closing appearing on a line by itself, it appears on the line next to the Hello, World. (Generating the exact whitespace you want is annoyingly difficult with XSL.)
A partial explanation for this complexity is that this is the merging of two languages, and attempting this is likely to be messy (The XSL support in Perl, Python and Ruby is not significantly better.) PHP+XSL is an ugly thing, completely unsuited to one-liners, and I'm not going to try to persuade you otherwise! XSL comes into its own when applied to hard problems and, as I hope this article will demonstrate, XSL and PHP can be a very powerful combination.
THE THREE PARTS TO XSL
There are three W3C specifications related to XSL:
XSLT - an XML-based language for transforming and manipulating XML. XSLT is a complete programming language with conditionals, loops, functions, and so on. Since it is XML-based, any XML "template" code that is intended to become part of the output XML does not need to be escaped or otherwise transformed; literal XML is fine. Though this small saving is welcome ("template" code commonly makes up half the characters in the file, and it's not unusual for them to make up 90%), XSLT is otherwise rather verbose.
XPath - a language for selecting (or picking out) nodes from an XML document. XPath is used within XSLT in much the same way that SQL might be used within PHP code. (An XSLT expression returns sets of nodes in much the same way that an SQL select statement returns sets of rows.) An XPath expression (which is a string, not XML) is typically a one-liner. XPath is both very powerful and very concise (arguably, too concise.)
XSL-FO - a souped-up CSS with XML syntax for defining the appearance of an arbitrary XML document. i.e. it can style any XML document, not just (X)HTML. Because PHP's XSL extension doesn't support XSL-FO and (since you're reading this on a PHP tutorials site you're probably styling your HTML or XHTML with CSS anyway) we'll ignore XSL-FO for the remainder of this post.
 
WHEN TO USE XSL?
So, when should you use a language that seems too verbose in some areas (XSLT), too concise in others (XPath), and an uncomfortable fit with PHP?
As I mentioned above, XSL is not a good fit if you have a relatively simple XML transformation problem. For example, XSL is probably not a good choice if you are dealing with an XML document that is more or less a straightforward XML serialisation of a database result set or similar, and the XML you need to produce is likewise relatively uncomplicated. If there aren't many nested elements and there is no potential for recursive structures, you will probably be better off with a simpler solution such as simplexml, which maps XML elements to PHP objects such that an element's children are accessible as object properties. Although there are still some advantages to using XSL (or at least XPath) in this situation, unless you have these skills already then the investment of time needed is probably too large.
EXAMPLES
As with most technical examples, the challenge is to find an application short enough to be understandable and complex enough to be useful. To overcome this I have included a series of examples of PHP and XSL usage, laid out in approximately increasing order of difficulty and sophistication.
Please note that the examples do not attempt to explain XSLT or XPath; to learn more about them I recommend the following resources:
Chapter 15 of Elliotte Rusty Harold's XML 1.1 Bible--a good narrative introduction.
The XSL FAQ. Even though it hasn't been updated for a few years, this FAQ is incredibly useful--I don't think I've ever come across a problem that's not answered there somewhere. (Though sometimes the answer takes quite a bit of searching to find. One problem is that it's organised by concept, so if you don't know the name of the thing you're looking for, you can get stuck.)
 
EXTRACT INFORMATION FROM A DIFFICULT-TO-MANIPULATE XML FILE
Apple software commonly reads or writes XML files in the "PLIST" format. For example, part of my iTunes Music Library.xml looks like this:
<dict>
 
  <key>Track ID</key>
  <integer>17828</integer>
 
  <key>Name</key>
  <string>The Girl You Lost to Cocaine</string>
 
  <key>Artist</key>
  <string>Sia</string>
 
  <key>Album</key>
  <string>Some People Have Real Problems</string>
 
</dict>
This format, in which keys and values are siblings, is somewhat difficult to work with: to figure out which album the track "The Girl You Lost to Cocaine" appears on, for example, you need to look for an element with the value "The Girl You Lost to Cocaine" that immediately follows a element with the value "Name", then find a sibling "Album" key, and then its corresponding value. Nevertheless, this can be done via one long XPath expression that picks out the correct node, and a small amount of XSL:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<!-- XPath expression to return album given track from "iTunes Music Library.xml" -->
 
<xsl:output omit-xml-declaration="yes"/>
 
<xsl:template match="/">
<xsl:value-of select="//key[text() = 'Name'][following-sibling::*[1][text() = 'The Girl You Lost to Cocaine']]/../key[text() = 'Album']/following-sibling::*[1]"/>
</xsl:template>
 
</xsl:stylesheet>
As before, the PHP script to utilise this would look something like this:
<?php
 
$xml = new DOMDocument();
$xml->load(sprintf("%s/Music/iTunes/iTunes Music Library.xml", getenv("HOME")));
 
$xsl = new DOMDocument;
$xsl->load('itunes.xsl');
 
$proc = new XSLTProcessor();
$proc->importStyleSheet($xsl);
 
echo $proc->transformToXML($xml);
Almost all of the work here is done by the XPath expression that runs across the page. It's not terribly human readable, but I do think it is compact for what it does. Note that there is no loop used to achieve this!
One useful trick this XSL stylesheet employs is the <xsl:output omit-xml-declaration="yes"/> declaration. This prevents the <?xml version="1.0"?> header line from being output.
REWRITING URLS
In many situations it's useful to transform URLs embedded within an XHTML document in some way. For example, you might have an XHTML fragment which includes the string <a href="isbn:014101900X">, and you want to transform this to <a href="http://www.amazon.co.uk/s?search-alias=stripbooks&field-isbn=014101900X">.
To achieve the transformation shown above will involve a bit more XSL than the previous examples. If the input XML is:
<body>
<p>Hello, I would like to recommend <a href="isbn:014101900X">this book</a> to you.</p>
</body>
And the XSL is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:output omit-xml-declaration="yes"/>
 
<xsl:template match="/">
  <xsl:apply-templates/>
</xsl:template>
 
<xsl:template match="a">
  <xsl:choose>
    <xsl:when test="starts-with(@href, 'isbn:')">
      <!-- "a" element has "href" attribute, and it starts with "isbn:" -->
      <xsl:copy>
        <!-- Create new "href" attribute -->
        <xsl:attribute name="href">http://www.amazon.co.uk/s?search-alias=stripbooks&amp;field-isbn=<xsl:value-of select="substring-after(@href, 'isbn:')"/></xsl:attribute>
        <!-- Copy attributes other than "href" -->
        <xsl:for-each select="@*">
          <xsl:if test="name(.) != 'href'">
            <xsl:copy-of select="."/>
          </xsl:if>
        </xsl:for-each>
        <!-- Copy children i.e. elements inside <a>...</a> -->
        <xsl:apply-templates select="node()"/>
      </xsl:copy>
    </xsl:when>
    <xsl:otherwise>
      <!-- If "href" attribute does not start with "isbn:", output element as is -->
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
 
<!-- Output all other elements as is -->
 
<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>
 
</xsl:stylesheet>

 

When we combine these two we generate output looking like this:

<body>
<p>Hello, I would like to recommend <a href="http://www.amazon.co.uk/s?search-alias=stripbooks&amp;field-isbn=014101900X">this book</a> to you.</p>
</body>

Calling PHP functions from XSLT

So far, we have used PHP simply as a shell to initialise and run the XSLT processor. However, it is possible to make standard PHP functions callable from an XSLT file by 'registering' them with the registerPHPFunctions() method of the XSLT processor itself.

It works like this: if you register the function greet($name) whilst initialising the processor with

$proc->registerPHPFunctions('greet');

then it becomes callable from the XSLT stylesheet via

<xsl:value-of select="php:function('greet', 'Clem')"/>

Accessing PHP functions in this way is useful because whilst XSL is a complete programming language, it lacks many features that are considered 'standard'. For example, there is very little support for anything beyond the most basic string and mathematical operations – it does not even include string comparison functions! I/O is limited to one function for reading XML files. There's not even any way to access the current time. However, if PHP functions are registered many of these limitations disappear, because we can use PHP's versions of the 'missing' functions instead.

Note: Some of these limitations are eliminated with XSL 2.0; in 2.0 you get string comparison functions, for example, but it still lacks most of the functions and operations you would think any language would have. However, since XSL 2.0 is not supported by PHP (the library used by PHP's XSL extension, libxslt, does not support it), the remainder of this article deals with XSL 1.0 only.

What can you do by calling PHP functions from XSL code? One simple and useful thing to do is to rewrite the src attribute of <img> elements. For example, you can rewrite img elements like <img src="/img/logo.png"> into <img src="/_version/200911121634/img/logo.png"> , where the 200911121634 is the date the file was modified.

You can then arrange for your webserver to serve all URLs beginning with /_version/ with headers indicating that the resource can be cached forever. When the image changes, its last modified date will change, hence the URL will change, and so the client will never see the old image.

To illustrate the functionality outlined above, here is an example of exactly this mechanism. We would start with some XML like this:

<body>
Here's our logo: <img src="/logo.png"/>
</body>

The XSL would then look something like this:

<?xml version="1.0"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:php="http://php.net/xsl"
  exclude-result-prefixes="php"
  xsl:extension-element-prefixes="php"
  version="1.0">
 
  <xsl:output omit-xml-declaration="yes"/>
 
  <xsl:template match="img">
    <xsl:choose>
      <xsl:when test="starts-with(@src, '/_version/')">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()|processing-instruction()"/>
        </xsl:copy>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <!-- Generate "src" attribute, by calling the PHP function _version(). The
          existing "src" attribute value is passed in. -->
          <xsl:attribute name="src">
            <xsl:value-of select="php:function('_version', string(@src))"/>
          </xsl:attribute>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
 
</xsl:stylesheet>

Then we could create a PHP script to make the actual translation using these:

<?php
 
function _version($url) {
 
    if (preg_match("/^https?:/", $url)) {
        return $url;
    }
 
    $path = parse_url($url, PHP_URL_PATH);
 
    // Add the query portion on again--need for e.g. scriptaculous which uses
    // URLs like:
    //
    // /js/scriptaculous.js?load=effects
 
    $query = parse_url($url, PHP_URL_QUERY);
    $query = empty($query) ? "" : "?" . $query;
 
    if (file_exists(sprintf("%s/%s", $_SERVER["DOCUMENT_ROOT"], $path))) {
        return sprintf(
            "/_version/%s%s%s",
            strftime("%Y%m%d%H%M%S", filemtime(sprintf("%s/%s", $_SERVER["DOCUMENT_ROOT"], $path))),
            $path,
            $query
        );
    }
    else {
        error_log("warning: can't find [$url]");
        return $url;
    }
}
 
$xml = new DOMDocument();
$xml->load('img.xml');
 
$xsl = new DOMDocument;
$xsl->load('img.xsl');
 
$proc = new XSLTProcessor();
$proc->registerPHPFunctions();
$proc->importStyleSheet($xsl);
 
echo $proc->transformToXML($xml);

We would then need to make a change to the apache configuration to set up the caching for these images, using something like this in the .htaccess file:

RewriteRule ^/_version/[0-9]+/(.*) /$1 [E=CACHE:true,L]
Header set "Cache-Control" "max-age=31556926, private" env=CACHE

You could achieve the same effect using only PHP (<img src="<?= _version("/img/logo.png") ?>">), but this is less elegant since you would need to do it for every <img> element). The PHP-only solution would not be applicable if the content is already XML.

The technique described above can also be adapted to add height and width tags where needed (even if the image is externally-hosted), since we can simply alter the contents of the <img> tag when we serve it.

Ideas

There are many more applications of using XSL to easily adapt markup (too many to mention in detail) but I have included a few more examples of things that XSL is suited for.

  • Dynamically generate "graphical" headers by rewriting header elements (<h1>, <h2>, and so on) from

<h1>Welcome</h1>

to

<h1><img alt="Welcome" src="/_version/20090803232845/img/text/h1/welcome.png"

You could then either generate the images dynamically as needed, or do it manually and cache the result. Dynamically generating sIFR or cufón code to output "graphical" headers would work, too.

  • Use output buffering to capture the entire contents of your PHP script (i.e. from the <html> to the </html>), and transform this with XSL. You'll need to be producing well-formed XHTML for this to work, but this technique might be useful for quick hacks or prototyping: if you need to make changes to the generated XHTML of a badly-written site, for example, this might allow you to quickly change URLs or move sections around without having to change the code itself.

Things XSL can't do

I don't think this tutorial has been guilty of advocating the use of XSL in situations where it doesn't belong but to be sure: XSL is not for everyone, and it's not for every application. There are also a few limitations to be wary of:

  • XSL cannot generate XML that is not well-formed. (There's one exception: if the root node of the output XML is html, then HTML is generated unless this feature is disabled.) This may turn out to be a problem--and probably when you least expect it. For example, you may want to generate output that looks like:

&lt;a href=&quot;<!--#echo ... -->"&gt;...</a>

for expansion by Apache's SSI mechanism. This is impossible with XSL, because comments can't be part of attribute values.
 

  • The output of one template can't become the input of another. If you have one template that transforms <foo/> to <bar/>, and another that transforms <bar/> to <baz/>, you can't input <foo/> and get <baz/>. This limitation is somewhat unexpected, and is due to the way the result tree (i.e. the XML output) is constructed from the source tree (i.e. the XML input): it's built depth-first, and once a node has been added to the result tree, that node cannot be changed. (Many of these limitations arise from the requirement that XSL be easy and efficient to process.)

Tips

Here are some tips I have picked up with working with XSL which may help you in your projects:

Disable output escaping
The disable-output-escaping attribute is useful if you want to do xsl:value-of on something whose value is escaped XML. For example, if you have:

<root><foo></root>

Then

<xsl:template match="/root"> <xsl:value-of select="." disable-output-escaping="yes"/> </xsl:template>

will return

<foo>

Note that disable-output-escaping only has an effect on the output; it does not affect how the input is interpreted.

String replacement
One of the helpful functions XSL 1.0 is missing is string replacement. For some reason, although there's no string replacement function in XSL 1.0, there is a substring-before() and substring-after(), which means you can do:

concat(substring-before($s, ':n'), $n, substring-after($s, ':n'))

to replace :n with $n in the string $s.

Conclusion

A colleague of mine once said that he suspects people like XSL because it is hard. And I suppose it is; as a language it's very different to PHP and instead has more in common with functional languages. Its integration with PHP is not especially smooth and it is hard to debug, especially if you call PHP from XSL, and finally because it's both too verbose and too terse at the same time.

However, as well as being difficult, it is also very powerful. There are things you can do with XSL that are extremely difficult to do any other way, but I hope this article has shown what some of those things are, and how they can be achieved.