|
Posted by David Stone on July 10, 2007, 7:55 am
Please log in for more thread options
wrote:
> > ms word should output xhtml without any css style. Tidy
> > (http://tidy.sourceforge.net/) helps quite a lot but
> > leaves the css styles like the following:
>
> [...]
>
> > In other word: I want all the attributes to be deleted.
> >
> > Is there an option for tidy to achive this or another
> > small app?
> >
> > ps.: I could do this with xslt but the input must be xml
> > and I have not used xslt for some years...
>
> At least some of the XSLT processors don't care what their
> input is, as long as it's something that looks like a
> DOMDocument. xsltproc (comes with libxslt) has a --html
> switch specifically for transforming HTML documents.
>
> I believe this very problem (or an extremely similar one)
> was discussed at some length a few months ago either here,
> on c.i.w.a.s, or on comp.text.xml. I'd recommend searching
> Google Groups' archives to see if you can find that thread.
When faced with a similar problem, someone recommended Beautiful
Soup -
http://www.crummy.com/software/BeautifulSoup/
I never got around to trying it (found a different way), so I
don't know how well it works.
|