2014-11-07-XSLT_Tips

XSLT usage and performance tips

Eight tips for how to use XSLT efficiently:

  • Keep the source documents small. If necessary split the document first.
  • Keep the XSLT processor (and Java VM) loaded in memory between runs
  • If you use the same stylesheet repeatedly, compile it first.
  • If you use the same source document repeatedly, keep it in memory.
  • If you perform the same transformation repeatedly, don’t. Store the result instead.
  • Keep the output document small. For example, if you’re generating HTML, use CSS.
  • Never validate the same source document more than once.
  • Split complex transformations into several stages.

Eight tips for how to write efficient XSLT:

  • Avoid repeated use of “//item”.
  • Don’t evaluate the same node-set more than once; save it in a variable.
  • Avoid xsl:number if you can. For example, by using position().
  • Use xsl:key, for example to solve grouping problems.
  • Avoid complex patterns in template rules. Instead, use xsl:choose within the rule.
  • Be careful when using the preceding[-sibling] or following[-sibling] axes. This often indicates an algorithm with n-squared performance.
  • Don’t sort the same node-set more than once. If necessary, save it as a result tree fragment and access it using the node-set() extension function.
  • To output the text value of a simple #PCDATA element, use xsl:value-of in preference to xsl:apply-templates.

XSLT Best Practices

XSLT (Extensible Stylesheet Language Transformations) is a functional language for transforming XML documents into another file structure such as plain text, HTML, XML, etc. XSLT is available in multiple versions, but version 1.0 is the most commonly used version. XSLT is extremely fast at transforming XML and does not require compilation to test out changes. It can be debugged with modern debuggers, and the output is very easy to test simply by using a compare tool on the output. XSLT also makes it easier to keep a clear separation between business and display logic.

Uses

XSLT has numerous uses. XML is easy to generate and can easily be transformed to the desired layout of other systems. Many older EDI systems need to receive data in a fixed, flat file format. One such example of a fixed file format is the ABA file format used in the banking industry of Australia. XSLT can be used to transform your data source to a flat file format for another system to consume, and that same data source can then be used to transform the data into HTML for display in a web browser. In fact, it’s even possible to use XSLT to build an XSLT view engine for use with MVC to render content.

Another use for XSLT is creating dynamic documents in various formats such as Word, Excel, and PDF. Starting with Office 2003, Microsoft began supporting the WordML and ExcelML data formats. These data formats are XML documents that represent a Word document or an Excel spreadsheet. Data from a database can be easily transformed into either of these formats through the use of XSLT. In addition, the same data source can also be transformed into XSL-FO to create PDF documents.

Besides the two uses above, you may want to consider using XSLT whenever you are working with templates, when you are working with XML data, or when you are working with static data that doesn’t need to live in a database. An example of a template would be an email newsletter that gets sent out and is “mail-merged” with data from the database.

Of course there are times that you could use XSLT to accomplish a programming task, but it might not be the right choice. For instance, it might be easier to use LINQ to access data from an object hierarchy and then use a StringBuilder to build output rather than to use an XSLT to do the same thing. An XSLT might also not be appropriate for generating output if you need to do a large amount of string manipulation. Having to use certain string functions like replace or split are not as easy to accomplish in XSLT as they are in languages like C#.

Basics

Assuming that XSLT is the right solution for the task you are trying to accomplish, there are several basic things that a developer needs to be aware of. The first thing to remember is that XSLT is a functional language. Once a variable is set it cannot be changed. In order to change a value, you need to setup a template that you can call recursively. The following is an example of what that code might look like:

<xsl:template name="pad-left">
    <xsl:param name="totalWidth"/>
    <xsl:param name="paddingChar"/>
    <xsl:param name="value"/>
    <xsl:choose>
        <xsl:when test="string-length($value) &lt; $totalWidth">
            <xsl:call-template name="pad-left">
                <xsl:with-param name="totalWidth">
                    <xsl:value-of select="$totalWidth"/>
                </xsl:with-param>
                <xsl:with-param name="paddingChar">
                    <xsl:value-of select="$paddingChar"/>
                </xsl:with-param>
                <xsl:with-param name="value">
                    <xsl:value-of select="concat($paddingChar, $value)"/>
                </xsl:with-param>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$value"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

The template above performs the equivalent function of the pad left function in .Net. The pad-left template takes in three parameters. It then checks to see if the length of the value passed in is less than the total length specified. If the length is less then the template calls itself again passing in the value passed to the function concatenated with the padding character and the desired length. This process is repeated until the value passed into the template is greater than or equal to the string length passed into the template.

Another important thing to know when working with XSLT is that namespaces affect how you select data from XML. For instance, let’s say you’re working with XML that starts with the following fragment:

<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">

In order to select data from this XML document, you need to include a reference to the namespace(s) used in the XML document that you are consuming in your XSLT. For the example above you would do something like this:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    xmlns:fm="http://www.filemaker.com/fmpxmlresult"
    exclude-result-prefixes="msxsl fm">

<xsl:template match="fm:FMPXMLRESULT">
    <xsl:apply-templates select="fm:RESULTSET" />
</xsl:template>

The last area I would like to focus on is the use of templates. XSLT provides two techniques for accessing data. The push approach, as the name implies, pushes the source XML to the stylesheet, which has various templates to handle variable kinds of nodes. Such an approach makes use of several different templates and applies the appropriate template for a given node through the use of the xsl:apply-templates command. An example of this is as follows:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="Orders">
        <html>
            <body>
                <xsl:apply-templates select="Invoice"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="Invoice">
        <xsl:apply-templates select="CustomerName" />
        <p>
            <xsl:apply-templates select="Address" />
            <xsl:apply-templates select="City" />
            <xsl:apply-templates select="State" />
            <xsl:apply-templates select="Zip" />
        </p>
        <table>
            <tr>
                <th>Description</th>
                <th>Cost</th>
            </tr>
            <xsl:apply-templates select="Item" />
        </table>
        <p />
    </xsl:template>
    <xsl:template match="CustomerName">
        <h1><xsl:value-of select="." /></h1>
    </xsl:template>
    <xsl:template match="Address">
        <xsl:value-of select="." /><br />
    </xsl:template>
    <xsl:template match="City">
        <xsl:value-of select="." />
        <xsl:text>, </xsl:text>
    </xsl:template>
    <xsl:template match="State">
        <xsl:value-of select="." />
        <xsl:text> </xsl:text>
    </xsl:template>
    <xsl:template match="Zip">
        <xsl:value-of select="." />
    </xsl:template>
    <xsl:template match="Item">
        <tr>
            <xsl:apply-templates />
        </tr>
    </xsl:template>
    <xsl:template match="Description">
        <td><xsl:value-of select="." /></td>
    </xsl:template>
    <xsl:template match="TotalCost">
        <td><xsl:value-of select="." /></td>
    </xsl:template>
    <xsl:template match="*">
        <xsl:apply-templates />
    </xsl:template>
    <xsl:template match="text()" />
</xsl:stylesheet>

The pull approach on the other hand makes minimal use of xsl:apply-template instruction and instead pulls the xml through the transform with the use of the xsl:for-each and xsl:value-of instructions. Using the pull technique, the above template would look something like this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="Orders">
        <html>
            <body>
                <xsl:for-each select="Invoice">
                    <h1>
                        <xsl:value-of select="CustomerName" />
                    </h1>
                    <p>
                        <xsl:value-of select="Address" /><br />
                        <xsl:value-of select="City" />
                        <xsl:text>, </xsl:text>
                        <xsl:value-of select="State" />
                        <xsl:text> </xsl:text>
                        <xsl:value-of select="Zip" />
                    </p>
                    <table>
                        <tr>
                            <th>Description</th>
                            <th>Cost</th>
                        </tr>
                        <xsl:for-each select="Item">
                            <tr>
                                <td><xsl:value-of select="Description" /></td>
                                <td><xsl:value-of select="TotalCost" /></td>
                            </tr>
                        </xsl:for-each>
                    </table>
                    <p />
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

You can read more about these two approaches at http://www.xml.com/pub/a/2005/07/06/tr.html and http://www.ibm.com/developerworks/library/x-xdpshpul.html.

Best Practices

While XSLT is extremely fast and powerful, there are several rules to keep in mind in order to write quality code. They are as follows:

  • Avoid the use of the // near the root of the document especially when transforming very large XML document. The // selector selects nodes in the document from the current node that match the selection no matter where they are in the document. It is best to avoid using the // operator all together if possible. More scanning of the XML document is required which makes transforms take longer and makes them less efficient.
  • Avoid the use of very long xpath queries (i.e. more than a screen width long). It makes the XSLT logic difficult to read.
  • Set the indent attribute in the output declaration to off when outputting XML or HTML. Not only will this reduce the size of the file you generate, but it will also decrease the processing time.
  • Try to use template matching (push method) instead of named templates (pull method). Named templates are fine to use for utility functions like the padding template listed above. However, template matching will create cleaner and more elegant code.
    Make use of built in XSLT functions whenever possible. A good example of this is when you are trying to concatenate strings. One approach to accomplish this would be to utilize several xsl:value-of instructions. However, it is much cleaner to use the xsl concat() function instead.
  • If you are transforming a large amount of data through .Net code you should utilize the XmlDataReader and XmlDataWriter classes. If you try and use the XmlDocument class to read in your XML and the StringBuilder class to write out your XML you are likely to get an Out of Memory exception since data must be loaded in one continuous memory block.

Additional best practices can be found here:

http://www.xml.org//sites/www.xml.org/files/xslt_efficient_programming_techniques.pdf

XSLT Tips for Cleaner Code and Better Performance

Conclusion

There are many times to consider using XSLT. The language tends to be verbose and at times it can feel unnatural to program in if you are more accustomed to a procedural programming style. However, it is a flexible and powerful language that with a little time can be easy to pick up and learn. There are debugging and profiling tools available to make the development process easier. In addition, changes to an XSLT does not require compilation in order to test, which can easily be done by comparing output with a compare tool such as Araxis Merge.

XSLT Tips for Cleaner Code and Better Performance

On this page:

  • Avoid XSLT Named Templates; Use Template Match
  • Avoid xsl:for-each; Use Template Match
  • You don’t have to use xsl:element or xsl:attribute
  • Use the element name itself rather than xsl:element
  • Use the { } shorthand for writing values inside of attributes
  • Use template modes
  • Use in-built functions: concat()
  • Use in-built functions: boolean()
  • Use in-built functions: string()
  • Use in-built functions: number()
  • Use in-built functions: other
  • More tips

XSLT is a transformation language to convert XML from one format to another (or to another text-based output).

People seem to love or hate XSLT. Some find it hard to read or strange to get used to. Yet, it can be quite elegant when coded right. So this will be the first in a series of posts to show where it can be useful (and what its pitfalls/annoyances may be), how to make best use of XSLT, etc.

This first post looks at coding style in XSLT 1.0 and XPath 1.0.

I think some frustrations at this technology come from wanting to do procedural programming with it, whereas it is really more like a functional programming language; you define what rules to act against, rather than how to determine the rules (kind of).

For example, consider the following example where a named template may be used to create a link to a product:

<xsl:template name="CreateLink">
  <xsl:param name="product" />
  <xsl:element name="a">
    <xsl:attribute name="href">
      <xsl:value-of select="'/product/?id='" /><xsl:value-of select="normalize-space($product/@id)" />
    <xsl:value-of select="$product/name" />
  </xsl:element>
</xsl:template>

I have found the above to be a common way people initially code their XSLTs. Yet, the following is far neater:

<xsl:template match="product">
  <a href="{concat('/product/?id=', normalize-space(./@id))}">
    <xsl:value-of select="./@name" />
  </a>
</xsl:template>

Not only does such neater coding become easier to read and maintain, but it can even improve performance.

(Update: As Azat rightly notes in a comment below the use of ‘./’ is redundant. That is definitely true. I should have added originally that I tend to use that to help others in the team, especially those newer to XSLT to understand the context of which element your template is running under a bit more clearly.)

Lets look at a few tips on how this may be possible (a future post will concentrate on additional performance-related tips; the tips below are primarily on coding style):

Avoid XSLT Named Templates; Use Template Match

The first coding practice that leads to code bloat and hard to read XSLT is using named templates everywhere. Named templates give a procedural feel to coding. (You define templates with names, pass parameters as needed and do some stuff). This may feel familiar to most coders, but it really misses the elegance and flexibility of XSLT.

So, instead of this:

<xsl:template name="CreateLink">
  <xsl:param name="product" />
  <-- create the link here based on the product parameter -->
</xsl:template>

<-- The above would be called from elsewhere using this: -->
<xsl:call-template name="CreateLink"<>
  <xsl:with-param name="product" select="./product" />
</xsl:call-template>

Far neater would be this:

<xsl:template match="product">
  <-- create the link here based on the product parameter -->
</xsl:template>

<-- The above would be called from elsewhere using this: -->
<xsl:apply-templates select="./product" />

The above example doesn’t look like much on its own. When you have a real stylesheet with lots of template matches, (and modes, which we look at later) this gets a lot easier to read, and cuts a LOT of code, especially when calling/applying these templates.

(Of course, each tip has exceptions; named templates can be useful for utility functions. Sometimes XSLT extension objects can be useful for that too, depending on your parser and runtime requirements. A subsequent post on XSLT performance tips will cover that.)

Avoid xsl:for-each; Use Template Match

xsl:for-each is another programming construct that would appeal to many coders. But again, it is rarely needed. Let the XSLT processor do the looping for you (it has potential to be optimised further, too).

There are some instances or XSLT parsers that may perform a bit quicker using xsl:for-each because for-each avoids the XSLT processor having to determine which of possibly many matched templates is the suitable one to execute. However, matched templates that use modes can overcome those issues to most extents, and lend to highly elegant, reusable XSLT.

You don’t have to use xsl:element or xsl:attribute

You can use xsl:element and xsl:attribute, but it leads to very bloated code.

Here are a few examples of what you can do instead. In each example we will just assume we are working with some XML that represents some kind of product (it is not important what this structure is for this discussion).

Use the element name itself rather than xsl:element

Instead of

<xsl:element name="p">
  <xsl:value-of select="$product/name" />
</xsl:element>

This is a lot cleaner to read:

<p>
  <xsl:value-of select="$product/name" />
</p>

Sometimes I prefer this:

<p><xsl:value-of select="$product/name" /></p>

Use the { } shorthand for writing values inside of attributes

Using xsl:value-of for many attributes can get verbose very quickly. There is more code to read. So the code just looks uglier and more bloated. For attributes only then, with most XSLT parsers, you can use the shorthand { as a replacement for .

In between { and } you just put in your normal select expression.

So, instead of

<h3>
    <xsl:attribute name="class">
        <xsl:value-of select="$product/@type" />
    </xsl:attribute>
    <xsl:value-of select="$product/name" />
</h3>

This is a lot cleaner to read:

<h3 class="{$product/name}">
  <xsl:value-of select="$product/name" />
</h3>

Or, instead of

<xsl:element name="img">
    <xsl:attribute name="src" select="$product/image/@src" />
    <xsl:attribute name="width" select="$product/image/@width" />
    <xsl:attribute name="height" select="$product/image/@height" />
    <xsl:attribute name="alt" select="$product/image" />
    <xsl:attribute name="class" select="$product/@type" />
</xsl:element>

This is a lot cleaner to read:

<img
    src="{$product/image/@src}"
    width="{$product/image/@width}"
    height="{$product/image/@height}"
    alt="{$product/image}"
    class="{$product/@type}"
    />

The above is only put onto multiple lines for this web page. In a proper editor sometimes a one-liner is even easier to read:

<img src="{$product/image/@src}" width="{$product/image/@width}" height="{$product/image/@height}" alt="{$product/image}" class="{$product/@type}" />

The above is also looking a lot like some templating languages now, and you might see why I am wondering why there are so many proprietary ones people have to learn, when XSLT is an open, widely supported, standard with transferable skills!

The above also doesn’t show how clean the code would really be, because someone using xsl:attribute is likely to use xsl:element as well, so really we should compare the legibility of this:

<xsl:element name="h3">
    <xsl:attribute name="class">
        <xsl:value-of select="$product/@type" />
    </xsl:attribute>
    <xsl:value-of select="$product/name" />
</xsl:element>

… versus this:

<h3 class="{$product/name}">
    <xsl:value-of select="$product/name" />
</h3>

Use template modes

Often, you will want to use a template match for totally different purposes. Rather than pass unnecessary parameters or resort to different named templates, a mode attribute on the template can do the trick.

For example, suppose you are showing an order history for some e-commerce site. Suppose you want a summary of orders at the top that anchor to the specific entries further down the page.

You can have more than one template have the same match, and use mode to differentiate or indicate what they are used for.

Consider this example. First, here is a starting point in the XSLT. The idea is to reuse the Orders element, one for summary purpose, the next for details.

<!-- starting point -->
<xsl:template match="/">
    <h1>Order summary</h1>
    <h2>Summary of orders</h2>
    <p><xsl:apply-templates select="./Orders" mode="summary-info" /></p>
    <h2>Table of orders</h2>
    <xsl:apply-templates select="./Orders" mode="order-summary-details" />
</xsl:template>

Next, we match Orders with the summary-info mode:

<xsl:template match="Orders" mode="summary-info">
    <xsl:value-of select="concat(count(./Order), ' orders, from ', ./Order[1]/@date, ' to ', ./Order[last()]/@date)" />
</xsl:template>

We can also match Orders for the order-summary-details mode. Note how the variable has also re-used the other mode to get the summary for the table’s summary attribute.

<xsl:template match="Orders" mode="order-summary-details">
    <xsl:variable name="summary">
        <xsl:apply-templates select="." mode="summary-info" />
    </xsl:variable>
    <table summary="{normalize-space($summary)}">
        <thead>
            <tr>
                <th scope="col">Order number</th>
                <th scope="col">Amount</th>
                <th scope="col">Status</th>
            </tr>
        </thead>
        <tbody>
            <xsl:apply-templates select="./Order" mode="order-summary-details" />
        </tbody>
    </table>
</xsl:template>

Note how the same mode name can be used for additional matches. This is a neat way to keep related functionality together:

<xsl:template match="Order" mode="order-summary-details">
    <tr>
        <td><a href="/order/details/?id={./@id}"><xsl:value-of select="./@id" /></a></td>
        <td><xsl:value-of select="./amount" /></td>
        <td><xsl:value-of select="./status" /></td>
    </tr>
</xsl:template>

In many real XSLTs I have written these modes can be re-used many times over. They help with performance, while maintaining this elegance/reduction of code because the XSLT processor can use that to narrow down which possible template matches to select from when looking for the one to execute.

The use of modes (and other features such as importing other XSLTs and overriding moded templates) has allowed us to create multiple sub-sites in parallel (e.g. an ecommerce site that sells books, entertainment products (CDs, DVDs, computer games, etc) that all run off the same XSLTs with some minor customisation in each sub-site. Although the actual data is different, they fall into the same XML structure — they are products after all! — thus making the XSLTs highly reusable. A future post will describe arranging XSLTs in an almost object-oriented fashion).

Use in-built functions: concat()

The concat() function allows you to remove unnecessary and excessive uses of statements one after the other (and with the accompanying xsl:text /xsl:text type of trick to get a white space in there).

Code looks easier to read, in most cases, and typically performs better too.

Example:

Instead of this:

<xsl:value-of select="$string1" /><xsl:text> </xsl:text><xsl:value-of select="$string2" />

This is much cleaner to read:

<xsl:value-of select="concat($string1, ' ', $string2)" />

Or,

Instead of this:

<a>
    <xsl:attribute name="href">
        <xsl:value-of select="$domain" />/product/?<xsl:value-of select="$someProductId" />
    </xsl:attribute>
    <xsl:value-of select="$productDescription" />
</a>

This is much cleaner to read:

<a href="{concat($domain, '/product/?', $someProductId}">
    <xsl:value-of select="$productDescription" />
</a>

Storing a string resulting from a concat into a variable is also efficient from a performance point of view (storing node-sets does not cache the result, as in most DOM and XSLT implementations, node-sets are live collections. More on that in a future post).

(Update: Azat notes in a comment below that the above href attribute can be even further simplified into this: href=”{$domain}/product/?{$someProductId}”.)

Use in-built functions: boolean()

How many times have we seen code like this:

<xsl:if test="$new = 'true'"> ... </xsl:if>

While it works, it is not ideal using string comparison, especially if this kind of test is going to be repeated in a template.

It would be better to create a variable using this kind of syntax:

<xsl:variable name="isNew" select="boolean($new = 'true')" />

Then, in your code, when you need to use it, you can do things like:

<xsl:if test="$isNew"> ... </xsl:if>

or

<xsl:if test="$isNew = true()"> ... </xsl:if>

or

<xsl:if test="$isNew = false()"> ... </xsl:if>

or

<xsl:if test="not($isNew)"> ... </xsl:if>

These above variations are down to style/preference, but is better from a coding perspective than constant testing of strings. (Sometimes the calculation of what true or false means may require testing many values, such as true, True, 1, Y, etc. This can all be hidden away in that one variable declaration, and the rest of the code is unchanged.)

(Update: Azat rightly notes in a comment below that the variable declaration can be made smaller by omitting the actual boolean function so it is just this: . I find the explicit use of boolean can aid with readability, especially for those new to XSLT so might be useful to retain under such situations.)

Use in-built functions: string()

Instead of this:

<xsl:variable name="mystring">my text</variable>

Consider this:

<xsl:variable name="mystring" select="'my text'" />

Or this:

<xsl:variable name="mystring" select="string('my text')" />

Or, more importantly, instead of this:

<xsl:variable name="bookTitle"><xsl:value-of select="./title" /></xsl:variable>

Consider this:

<xsl:variable name="mystring" select="string(./title)" />

Why?

Code is cleaner to read.

But it is also more optimal; casting to a string instead of storing the node will result in the variable value being cached in most XSLT processors, rather than being re-evaluated each time it is accessed. (XML nodes are live collections according to W3C which means they may change. Hence references to nodes require evaluation each time they are accessed.)

Use in-built functions: number()

For similar reasons as above to use string(), number() should be used too.

Use in-built functions: other

XPath functions such as starts-with(), string-length() are handy.

For example, it is common to see code to test for the presence of strings by testing if a variable equals the empty string (”). But as most programmers should know, it is more efficient to test for the presence of a string by testing its length. In XPath expressions you can use string-length() function for this.

For more information and full list of XPath functions, consider the following:

  • The XPath 1.0 Specification from the W3C
  • The MSDN XPath reference from Microsoft (Same as the W3C information, of course, but has useful examples)
  • Documentation from Stylus (also has some useful examples)

More tips

The above is about XPath 1.0 and XSLT 1.0. Even with the above tips, some XSLT can require more code than ideal, which XSLT 2.0 and XPath 2.0 help to address. The features in those are very useful for sure, but not as widely implemented as 1.0. My experiences are almost entirely in 1.0 which we use in live, production/run-time environments.

Here are a some additional useful tips:

  • There are Monsters in My Closet or How Not to Use XSLT by R. Alexander Milowski from the School of Information Management and Systems, Berkeley
  • XSLT by Example is a blog of XSLT examples by Miguael deMelo

Do you have any useful tips to augment/improve the above? Let me know and I will add them above

Be Sociable, Share!

发表评论