Fixing code snippets syndicated from blogs.perl.org

I’ve got a Perl-specific blog over at blogs.perl.org which has its posts syndicated here (using the excellent FeedWordPress plugin).

The trouble is, pre-formatted blocks (i.e. the <pre> tag) are botched when importing, mainly because I need to perpetrate a hack over on the origin site to get things rendering properly. What would be nice is to have those blocks rendered here using the SyntaxHighlighter WordPress Plugin that is installed.

It turns out writing a filter plugin for WordPress is not that hard. There is a system of hooks at which the plugin registers callback subroutines. The callback gets passed some content which it can filter, and return back. All I need to do is filter out the junky <pre> sections for ones which will render well under SyntaxHighlighter.

First the plugin header, which WordPress uses to add an entry to the plugins list, in the Plugins section of the site admin panel:

<?php
/*
Plugin Name: FeedWordpress-SyntaxHighlighter
Plugin URI: http://blog.gorwits.me.uk/
Description: Import preformatted text properly for SyntaxHighlighter
Version: 1.1
Author: Oliver Gorwits
Author URI: http://blog.gorwits.me.uk/
License: Artistic
*/

Next we register the callback subroutine against a hook which has been added by the FeedWordPress plugin:

add_filter(
    /*hook=*/ 'syndicated_item_content',
    /*function=*/ 'fwp_add_class_to_pre_in_content',
    /*order=*/ 10, 
    /*arguments=*/ 2
);

And finally the callback itself:


function fwp_add_class_to_pre_in_content ($content, $post) {
    // remove code segement used to format on the blogs.perl.org site
    // replaces *all* occurrences
    $content = str_replace(
        '<pre><code class="prettyprint">',
        '<pre>',
        $content
    );  
    $content = str_replace(
        '</code></pre>',
        '</pre>',
        $content
    );  

    // locate and extract the pre blocks
    preg_match_all('/<pre>(?:.(?!<\/pre>))+.<\/pre>(?:<\/p>)?/s',
        $content, $matches, PREG_OFFSET_CAPTURE);

    // work through list of <pre> blocks, rewriting them
    foreach (array_reverse($matches[0]) as $val) {
        $pre_block = $val[0];
        $offset    = $val[1];

        $new_pre = '<pre class="brush: plain; gutter: false; title: ;">' .
                        // remove the junk added by syndication
                        // ok, this breaks if pre contains legit HTML
                        strip_tags($pre_block) .
                   '</pre>';

        // reinsert the new pre block into the content
        // replacing the old one
        $content = substr($content, 0, $offset) .
                   str_replace($pre_block, $new_pre,
                               substr($content, $offset));
    }

    // Send it back
    return $content;
} /* fwp_add_class_to_pre_in_content() */

I’m not the best PHP programmer out there, but I cobbled this together and it seems to work :-)

The final step is to drop this file into a subdirectory of .../wp-content/plugins/ on the server, and hey presto the plugin is available for activating. Next time the feed is syndicated, the content is passed through this code.

This entry was posted in blogging. Bookmark the permalink.

Comments are closed.