|
Posted by Stefan Weiss on August 29, 2005, 6:30 pm
Please log in for more thread options
Hi.
(this is somewhat similar to yesterday's thread about empty links)
I noticed that Tidy [0] issues warnings whenever it encounters empty
tags, and strips those tags if cleanup was requested. This is okay in
some cases (such as <tbody>), but problematic for other tags (such as
<option>). Some tags (td, th, ...) do not produce warnings when they are
empty.
The warnings are also issued for documents that are considered valid
"XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
A few examples:
1) empty <select> options
-----------------------------------------------------------------
<select name="x">
<option value=""></option>
<option value="foo">a foo</option>
<option value="bar">a bar</option>
</select>
[tidy] Warning: "trimming empty <option>"
I don't know how I could avoid this warning. I don't want to display any
text in the first option, and hacks like <option> </option> are not
acceptable.
2) Empty p, div, span, h1, a[href], ... tags
-----------------------------------------------------------------
<td class="c1"><span class="c2"></span> Some Text</td>
Constructs like this are sometimes used by our webdesigner in
combination with CSS. The span tag is technically empty, but the
stylesheet will cause an image to be displayed.
<div id="ibox"></div>
This could happen for example if "ibox" was a box containing additional
information for the main content, but there are no additional infos the
current page. The box itself should still be displayed, so the tag is
left empty.
<p class="notes"><?= $notes ?></p>
Empty tags can also occur as an artifact of server-side scripting; if
$notes is empty, so is the <p> tag (this case can be avoided, I know).
3) Empty td, th tags; script tags
-----------------------------------------------------------------
<tr><th></th></tr>
<tr><td></td></tr>
Tidy ignores empty table cells and does not try to strip them.
<script type="text/javascript" src="xxx.js"></script>
Same goes for empty script tags with src attributes.
4) Empty thead, tfoot tags
-----------------------------------------------------------------
<table>
<thead><tr><td>head</td></tr></thead>
<tfoot></tfoot>
<tbody><tr><td>body</td></tr></tbody>
</table>
Tidy issues a warning for the empty tfoot element, which is expected
because <tfoot> must never be empty. In this case I would actually
rather get an error instead of a warning, because the document does not
qualify as valid XHTML anymore.
I would like to get rid of the warnings in 1) + 2), to simplify
automated validation and to ease my mind. Is Tidy correct in issuing
warnings and stripping the tags? Should I always try to avoid empty
tags? If so, how?
Thanks in advance,
Stefan
[0] http://tidy.sourceforge.net/
|
|
Posted by Jim Moe on August 29, 2005, 12:40 pm
Please log in for more thread options
Stefan Weiss wrote:
>
> The warnings are also issued for documents that are considered valid
> "XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
> warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
>
The actual phrasing is:
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
Info: Document content looks like XHTML 1.0 Transitional
That means there is some non-strict syntax used in the document
somewhere (tidy wouldn't want so say where, of course). The W3C validator
is much better at identifying specific problems.
--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
|
|
Posted by Nick Kew on August 29, 2005, 10:27 pm
Please log in for more thread options
Jim Moe wrote:
> Stefan Weiss wrote:
>
>>
>> The warnings are also issued for documents that are considered valid
>> "XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
>> warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
>>
> The actual phrasing is:
> Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
> Info: Document content looks like XHTML 1.0 Transitional
> That means there is some non-strict syntax used in the document
Um, it means no such thing. Tidy has a bad habit of thinking
valid, strict markup is "transitional".
> somewhere (tidy wouldn't want so say where, of course). The W3C
> validator is much better at identifying specific problems.
Indeed, a validator will tell you exactly what is allowed.
If you want the Strict/Legacy distinction highlighted more
clearly, AccessValet will do that using the 'trafficlight'
metaphor (green=good, amber=deprecated, red=invalid markup).
The amber then represents the difference between strict and
"transitional".
(Note that neither Tidy nor AccessValet is a validator.
The same is true of some tools that are marketed as "validator"s).
--
Nick Kew
|
|
Posted by Jim Moe on August 30, 2005, 12:07 am
Please log in for more thread options
Nick Kew wrote:
>>>
>> The actual phrasing is:
>> Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
>> Info: Document content looks like XHTML 1.0 Transitional
>> That means there is some non-strict syntax used in the document
>
> Um, it means no such thing. Tidy has a bad habit of thinking
> valid, strict markup is "transitional".
>
Hmm. Whenever I cleaned up the warnings indicated by a validator, tidy
then thought the document looked strict. Guess I've been lucky.
--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
|
|
Posted by Lars Eighner on August 30, 2005, 7:43 am
Please log in for more thread options
In our last episode,
the lovely and talented Jim Moe
broadcast on comp.infosystems.www.authoring.html:
> Stefan Weiss wrote:
>>
>> The warnings are also issued for documents that are considered valid
>> "XHTML 1.0 Strict" by the W3C Validator. If any of these empty-tag
>> warnings were issued, Tidy recommends "XMTML 1.0 Transitional" instead.
>>
> The actual phrasing is:
> Info: Doctype given is "-//W3C//DTD XHTML 1.0 Strict//EN"
> Info: Document content looks like XHTML 1.0 Transitional
> That means there is some non-strict syntax used in the document
> somewhere (tidy wouldn't want so say where, of course). The W3C validator
> is much better at identifying specific problems.
In point of fact, Tidy lies. And it will change the Doctype to
something wrong without asking permission. It often identifies
documents as "proprietary" when in fact onsgmls says they
validate with the advertised standard Doctype. If you use tidy,
I'd advise you to run documents through it without a Doctype,
stream the output to an untidy script to stick the correct
Doctype on it, and break lines in a way to cater to broken
browsers like IE, then stream it through onsgmls or a similar
real validator
Here is one of the untidies I use (this for 4.01 loose:
(change path in linux)
***********************
#!/usr/local/bin/perl
$| = "flush";
while(<STDIN>){
$_ =~ s#<html#<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"\n "http://www.w3.org/TR/html4/loose.dtd">\n\n<html#;
$_ =~ s/^\n//g;
$_ =~ s/li>\s*/li\n>/g; # Mostly to keep IE from breaking
# because IE can't do lists right.
$_ =~ s/ul>\s*/ul\n>/g;
$_ =~ s$/ul>$/ul\n>$g;
$_ =~ s/> \;</></g; # Takes out nbsp used to keep empty
# elements
$_ =~ s/<ul><li>/<ul\n><li\n>/g;
$_ =~ s#</li><li>#</li\n><li\n>#g;
print STDOUT;
}
**************************
--
Lars Eighner eighner@io.com http://www.larseighner.com/ "Fascism should more properly be called corporatism, since it is the
merger of state and corporate power."-Benito Mussolini * When you write the
check to pay your taxes, remember there are two l's in "Halliburton."
|
| Similar Threads | Posted | | Empty Alt Tags | April 1, 2005, 3:16 pm |
| What's at the end of empty xhtml tags? | February 19, 2006, 2:55 am |
| Style tags -- Eeek how obese these tags make HTML! | November 8, 2006, 3:33 am |
| Meta Tags, Link Tags, other | September 27, 2005, 3:29 pm |
| Tidy message prob | September 22, 2005, 11:19 am |
| tidy html editor | January 12, 2006, 10:04 pm |
| Tidy HTML - feedback requestes | November 20, 2004, 5:20 pm |
| trouble using html tidy with template files | March 4, 2006, 1:12 pm |
| html tidy, word 2003 and "smart quotes" | April 13, 2005, 7:30 pm |
| tidy ms word output as pure xhtml without css style and font styles | July 10, 2007, 5:10 am |
|