Parsing newsfeeds with XSL (1)

Magpie is a very popular RSS parser for PHP. Although I used Magpie quite a while to my satisfaction, it suffers a few flaws. At first it seems to have problems with the UTF8. Dealing with UTF8 is not easy in PHP (and MySQL). And there is verbose code, you feel it can be done quicker and simplier. We will parse feeds with only 5 lines of code and a simple stylesheet.

So why not create an RSS feed ourselves. We will use XSL transformations (XSLT). XSL is developed to manipulate XML documents and that is exactly what we want. Remember RSS is a short and simple XML-document.

XSL is a powerful language but a little counterintuitive. You can do a lot with surprisingly little code, but sometimes it will cost you hours and you won’t work it out.

To use XSL parsing you need to have the libxslt compiled into PHP.

You transform XML document by applying the XSLT transform with the XSLT stylesheet.

In php:
[sourcecode lang=’xslt’]

// Load XSL
$xsl = newDOMDocument; $xsl->load(‘stylesheet.xsl’);

// Create new XSLTProcessor
$xslt = new XSLTProcessor();
// Load stylesheet
$xslt->importStylesheet($xsl);

// Load XML-document
$feed = file_get_contents(‘http://…/yourRSSfeed.xml’)
$xml = new DOMDocument; $xml->loadXML($feed);

// Transform into outputstring
echo $results = $xslt->transformToXML($xml);

[/sourcecode]

The xml is your RSS-feed, that is easy.

The XSLT stylesheet has to look like this. You start by telling what kind of file it is.
[sourcecode lang=’xslt’]



[/sourcecode]

Then we code which nodes to transform, look for an rss root tag:
<code><xsl :template match="/rss"></xsl></code>

and do this print list of all items:
<pre><p id="MyFeed">….</p></pre>
<pre> <xsl :for-each select="//item"></xsl></pre>

<pre>
<a href="{link}"><xsl :value-of select="title" /></a>
</pre>
Value of select , selects the textcontent of the title tag.
{link} is the same but a shortcut for easy use between quotation marks.
With title of item als text and link-url as url.

Complete xsl-stylesheet:
[sourcecode lang=’xslt’]

[/sourcecode]

Save this to your directory as stylesheet.xsl and run the earlier phpcode.
This will give you a simple list with url and names of the posts.

7 Responses to “Parsing newsfeeds with XSL (1)”

  1. Chad Bishop Says:

    Just getting back into some XSLT, and have a request to get libxslt installed on the server now.

    Any similar code that will work with the more “standardly” distributed PHP/XSL library?

    or does libxslt simply do a much better job at making things simpler?

    -cb

  2. Chad Bishop Says:

    thnx for the nice, clear write up btw.

  3. programmer Says:

    That does depend of which PHP version you use. For PHP 5, which uses the newer and faster XSL extension based on libxslt, XSLT support is installed by default.
    For PHP4 XSLT support is based on the Sablotron library, which you have to install and compile into PHP with –enable-xslt –with-xslt-sablot options or the domxml library.

    The XSLT code presented above is the same, it’s simple XSLT 1.0 and is supported by all libraries.
    In PHP 4 the functions to process the translation would be something like:
    $xmldoc = domxml_open_file($xml);
    $xsldoc = domxml_xslt_stylesheet ($xslt);
    $result = $xsldoc->process($xmldoc);
    print $xsldoc->result_dump_mem($result);

    As you can see this is not ‘clean’ object-orientated.

    A powerfull option for PHP5, (and not well known), is that you can mix the XSLT with PHP functions:

    Before that you have to setup the processor to allow PHP like:
    $xslt->registerPHPFunctions(‘MYFUNC’);
    This will enhance the ease of XSLT enormously; the downside is that you can’t use your XLST code elsewhere.

  4. XSL guy Says:

    Your XSL can’t work because you aren’t closing your tags right and you aren’t even closing the xsl:stylesheet at all. Plus, there’s a rogue space between “xsl:” and “value of.” The use of “//” is very inefficient, but in this case probably won’t do too much harm.

    Use this instead:

  5. XSL guy Says:

    And, of course, my XSL got eaten on submission, so it can’t be seen…

  6. programmer Says:

    @XSL. Thx, you’re right, last line was missing. I use the code, so I can assure you it’s working. Problem is sometimes to get the code pass TinyMCE. And that can be tricky, as you said yourself. I will see into that.

  7. Dinner Ideas Says:

    Hey very nice blog!! Will add to feed reader 🙂

Categories
Archives
Links