Click here to get back home

How can I remove tags which have no attributes?

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
How can I remove tags which have no attributes? Oberon 05-28-2005
Posted by Oberon on May 28, 2005, 6:31 pm
Please log in for more thread options


I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.

How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't
want to remove.



Posted by Jim Moe on May 29, 2005, 10:42 am
Please log in for more thread options


Oberon wrote:
> I have a large HTML document. It has hundreds of <span>s which
> have no attributes so these <span>s are redundant.
>
> How can I remove these tags automatically?
>
Use a macro in your editor. Have it search for "<span>" and delete it;
then search for "</span>" and delete it. Repeat as required.
This works if there are no nested spans.
Or use the HTMLTidy method as Tim described.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)


Posted by Dr John Stockton on May 29, 2005, 10:10 pm
Please log in for more thread options


, dated Sun, 29 May 2005 15:59:17, seen in news:comp.infosystems.www.aut
>
>I just did a little experiment with a text editor and the HTML tidy
>program. I made a small test HTML file with some <span>bogus</span>
>contents splattered throughout it. Used the search and replace option in
>the text editor to replace all <span> opening tags with nothing, then ran
>HTML tidy on it. It stripped out the erroneous closing </span> tags.


If you have

<span A> aaa <span> bbb <span B> ccc </span> ddd </span> eee </span>

in which A and B are useful, then, after you have removed the <span>
between aaa & bbb, how can TIDY possibly tell that it is the </span>
between ddd & eee that should be removed, and not the final one?

ISTM better to use something like MiniTrue or SED to remove the tags
from each detectable instance of
<span> *something*not*including*<span>*for*sure* </span>
which with any luck will remove the great majority, and should do no
harm. To allow *everything*not*including*<span>*for*sure may be
difficult; but to allow *everything*not*including*<*for*sure* may catch
a sufficient proportion to be useful.

Alternatively, one might write a program to do it in a general high-
level language program, one that tracks the nesting level so that it can
remove the correct one.

Caveat : ISTM that CSS might, in some version, allow styling <span> with
added whitespace. If that should be, removing <span> from bbba<span>bbb
could make a visible difference.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
free, DOS/Win/UNIX, <URL:http://www.idiotsdelight.net/minitrue/> Update hope?

Posted by Tim on May 30, 2005, 1:38 pm
Please log in for more thread options



>> I just did a little experiment with a text editor and the HTML tidy
>> program. I made a small test HTML file with some <span>bogus</span>
>> contents splattered throughout it. Used the search and replace option in
>> the text editor to replace all <span> opening tags with nothing, then ran
>> HTML tidy on it. It stripped out the erroneous closing </span> tags.


> If you have
>
> <span A> aaa <span> bbb <span B> ccc </span> ddd </span> eee </span>
>
> in which A and B are useful, then, after you have removed the <span>
> between aaa & bbb, how can TIDY possibly tell that it is the </span>
> between ddd & eee that should be removed, and not the final one?

Well, I did say do a test. It will depend on the data that you're working
with... Of course, if you have nested spans you're going to need something
smarter.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Similar ThreadsPosted
Browser behavior with unknown tags or attributes? September 28, 2006, 11:38 pm
Style tags -- Eeek how obese these tags make HTML! November 8, 2006, 3:33 am
Meta Tags, Link Tags, other September 27, 2005, 3:29 pm
style to remove all styles March 25, 2005, 5:09 am
How to remove page break? May 29, 2005, 12:53 pm
How do I remove frames EASILY?? November 21, 2005, 2:03 pm
How to remove the spacing between
tag?
October 17, 2007, 9:09 am
Want to remove img-border w/ xhtml, not html January 14, 2007, 2:23 am
how to remove unwanted space after using css? July 12, 2007, 5:01 am
How to remove spaces inside the cell January 10, 2008, 8:42 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap