a commandline tool to drop css and javascript?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Hello. I am looking for a commandline tool to take an html document (or
html document segment, a.k.a. without beginign
"<html><head>..</head><body>") and process it by removing all css style
settings and javascripts, and output a clean html/xhtml.

Optionally, it would be nice if this tool can take an
acceptable tag list and remove all tags not in this list.

I need such a tool to process a lot of static html document I am working
on. Do you happen to know such a tool? I am still googling around ;) I
tried tidy but there seems not to be an option to remove css.

Thanks a lot!

Re: a commandline tool to drop css and javascript?

Gazing into my crystal ball I observed Zhang Weiwu

Quoted text here. Click to load it
Quoted text here. Click to load it

Can you use search and replace?  How about looking for style=" .  Seems
to me search and replace will be what you want to do.  Google for a good
search and replace tool, or I am sure someone will be around shortly to
tell you another way.

Adrienne Boswell at Home
Arbpen Web Site Design Services
Please respond to the group so others can share

Re: a commandline tool to drop css and javascript?

Quoted text here. Click to load it

Unless your source HTML is so tag-soupy no sane HTML parser
can grok it, XSLT is great for this kind of stuff. Of
course, you'll also need an XSLT processor that can
transform HTML documents (libxslt can do that, and probably
many others).

pavel@debian:~/dev/xslt$ cat raw.html
          "-//W3C//DTD HTML 4.01//EN"
          "http://www.w3.org/TR/html4/strict.dtd ">
    <style type="text/css">
      body { font-family : monospace ; }
    <script type="text/javascript">
      function oink ( ) { alert ( 'oink!' ) ; }
    <div style=" color : blue ;">
      <span style=" font-style : italic ; "
      onclick=" oink ( ) ; ">oink!</span>

pavel@debian:~/dev/xslt$ cat strip_jscss.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform ">
  <xsl:output method="html"/>

pavel@debian:~/dev/xslt$ xsltproc -html strip_jscss.xsl
<meta http-equiv="Content-Type" content="text/html;

Naturally, you'll want to tinker with xsl:output to get
valid HTML as an output, and you'll need to fine-tune the
exclusion template to handle all the event handler
attributes etc. xsltproc is a command-line utility that
comes with libxslt, but as I said, I'd expect most of XSLT
processors capable of transforming HTML as well.

Pavel Lepin

Site Timeline