HTML purifier for Perl

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I recently discovered a security risk in my perl script that takes input =

from an HTML form, saves it in a database, and converts it to HTML for=20
display. When I tried to add HTML to the content I was pleased with the=20
results, but I also considered the problems that could be caused by=20
incorrect format. And even worse, malicious code could be inserted for =

So, I searched for a way to prevent this, and I found =,=20
which has a PHP utility which fixes errors and blocks malicious code. =
But my=20
script is in Perl, and I didn't want to rewrite it in PHP. So, I found a =
to use the PHP script from my Perl

Essentially I am getting the environment variables from the form's POST, =
writing each to a "Raw.htm" file. Then I invoke the PHP script using a=20
heredoc as follows:

    <iframe src=3D"HTMLfilter.php"> </iframe>

and then (after a 1 second sleep to allow for processing) I read the=20
"Pure.htm" file back and use it for the variable. There may be better =
to do this, but I'm just glad I got this to work, and I feel better =
having people use the submission form with this safeguard in place. If =
have any ideas as to how to do this better or in a different way, please =
me know. I could not find a Perl script to accomplish the same thing,=20
although there seem to be a few other utilities available, such as=20 .

The PHP script is as follows:


//  HTMLfilter.php - PES - January 20, 2011
//  This converts raw HTML to purified (safe) HTML
//  Called from
//  Read Raw.htm, Write to Pure.htm

require_once 'library/';

$config = HTMLPurifier_Config::createDefault();

// configuration goes here:
$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your=20

$purifier = new HTMLPurifier($config);

// untrusted input HTML read from "Raw.htm"
$fHTMLrawfile = "Raw.htm";
$fHTMLraw = fopen($fHTMLrawfile, 'r') or die("can't open file");
$html = fread($fHTMLraw, filesize($fHTMLrawfile));

$pure_html = $purifier->purify($html);

// write purified HTML to "Pure.htm"
$fHTMLpurefile = "Pure.htm";
$fHTMLpure = fopen($fHTMLpurefile, 'w') or die("can't open file");
fwrite($fHTMLpure, $html);

echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';

// vim: et sw=3D4 sts=3D4

Re: HTML purifier for Perl

responding to
thomasrobert wrote:
HTML Purifier is a standards-compliant HTML filter library written in PHP.
Purifier will not only remove all malicious code (better known as XSS)
with a
thoroughly audited, secure yet permissive whitelist, it will also make
sure your
documents are standards compliant, something only achievable with a
knowledge of W3Cís specifications.

Re: HTML purifier for Perl

Quoted text here. Click to load it

How hard did you look? I went to <>, typed in HTML,
and in about three minutes I found HTML::Declaw on the second page of


Perl generally doesn't rely on separate, self-contained scripts to do
such things. Instead, HTML::Declaw is a module that one can use in one's
own script.


Sherm Pendley
Cocoa Developer

Re: HTML purifier for Perl

"Sherm Pendley"  wrote in message=20

Quoted text here. Click to load it


Quoted text here. Click to load it

I admit that I just did a quick Dogpile search rather than cpan. By that =

time I already had some of the HTMLpurifier system working, and I was=20
impressed by the quick response from the support forum and the =
from various prestigious sites that are using it. Also, I am not very=20
proficient in Perl (or PHP or even HTML for that matter), so I really =
a simple example from which I could build my application. And it was a =
learning experience to become familiar with at least a little basic PHP, =
ways to interface between PHP and Perl.

Thanks for the information.


Site Timeline