|
Posted by Thomas Kuehne on November 25, 2004, 8:11 am
Please log in for more thread options
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I am currently reviewing some HTML parsing software.
One of the source code comments reads:
# Scan to end of comment.
# Comments are defined any of a number of ways.
# IE 5.0: <!-- followed by >
# "HTML The Definitive Guide": <!-- text with at least one space in it -->
# Netscape: <!-- --> comments nest
# w3c: whitespace can appear between -- and > of comment close
Does anyone know of post 1998 HTML documents that use the IE or
Netscape "features"?
Thanks for any hints and comments.
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.9 (GNU/Linux)
iD8DBQFBpXeS3w+/yD4P9tIRAjW9AKDPiBf/lQ5N6w6ac+ok9Q2a29SzagCeNPgE
1DG2XNq7bSYI/omcUrC6tkA=
=GSzX
-----END PGP SIGNATURE-----
|
|
Posted by Nick Kew on November 25, 2004, 10:22 am
Please log in for more thread options
> -----BEGIN PGP SIGNED MESSAGE-----
Isn't that singularly pointless?
> I am currently reviewing some HTML parsing software.
Does it claim to follow HTML (SGML) rules, XHTML (XML) rules, or tag-soup
(whatever takes the author's fancy) rules?
> # Scan to end of comment.
> # Comments are defined any of a number of ways.
> # IE 5.0: <!-- followed by >
That bears no relation to any form of HTML.
> # "HTML The Definitive Guide": <!-- text with at least one space in it -->
Why the space? The start and end are right for XML.
> # Netscape: <!-- --> comments nest
Comments nest? Interesting thought. It could almost be a
misinterpretation for doing the right thing - though that seems unlikely.
> # w3c: whitespace can appear between -- and > of comment close
Indeed, under SGML rules it can, but there's more to it than that.
Seems like the author of that software hasn't grasped SGML comments.
> Does anyone know of post 1998 HTML documents that use the IE or
> Netscape "features"?
XML-style comments are valid both as HTML and XHTML as well as
broken-parser-safe, and seem to be the norm. The only serious
brokenness often seen in the wild is use of -- within what the
author intends to be a comment.
--
Nick Kew
Nick's manifesto: http://www.htmlhelp.com/~nick/
|
|
Posted by Thomas Kuehne on November 25, 2004, 1:40 pm
Please log in for more thread options
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Nick Kew schrieb am Thu, 25 Nov 2004 09:22:30 +0000:
>> I am currently reviewing some HTML parsing software.
>
> Does it claim to follow HTML (SGML) rules, XHTML (XML) rules, or tag-soup
> (whatever takes the author's fancy) rules?
It states: "supports HTML".
The software in question uses a very plain parser that only extracts
the plain text enclosed by CODE tags and then starts the real
processing.
- From what I can see: The Soup roules! (not only tag-soup but also entity-soup).
>> # Netscape: <!-- --> comments nest
>
> Comments nest? Interesting thought. It could almost be a
> misinterpretation for doing the right thing - though that seems unlikely.
I've never read that comments could be nested inside of comments.
Have I missed something while reading the HTML & XHTML docs?
>> Does anyone know of post 1998 HTML documents that use the IE or
>> Netscape "features"?
>
> XML-style comments are valid both as HTML and XHTML as well as
> broken-parser-safe, and seem to be the norm. The only serious
> brokenness often seen in the wild is use of -- within what the
> author intends to be a comment.
Glad to hear that, now I can remove/cleanup a lot of the parsing code.
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.9.9 (GNU/Linux)
iD8DBQFBpcS93w+/yD4P9tIRAjtUAJ4/xzgZGBhUTJzS0l7IgnI/ZAi1rACglE5v
Vwz/mhRNJ/WqumkUo7gpEd0=
=rAbX
-----END PGP SIGNATURE-----
|
|
Posted by Neal on November 25, 2004, 10:52 am
Please log in for more thread options Thomas Kuehne wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
Are you aware that each message starts with the above and ends with...
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.9.9 (GNU/Linux)
>
> iD8DBQFBpcS93w+/yD4P9tIRAjtUAJ4/xzgZGBhUTJzS0l7IgnI/ZAi1rACglE5v
> Vwz/mhRNJ/WqumkUo7gpEd0=
> =rAbX
> -----END PGP SIGNATURE-----
Rather dumb, wouldn't you say? I can't find your newsreader in your
headers, but there must be a way to fix it.
|
|
Posted by kchayka on November 27, 2004, 11:31 am
Please log in for more thread options Nick Kew wrote:
>
>> # Scan to end of comment.
>> # Comments are defined any of a number of ways.
>> # IE 5.0: <!-- followed by >
>
> That bears no relation to any form of HTML.
Not standard HTML, though might be WinIE conditional comments which do
follow that general syntax.
--
Reply email address is a bottomless spam bucket.
Please reply to the group so everyone can share.
|
| Similar Threads | Posted | | Conditional comments: is this new? | April 6, 2005, 3:23 am |
| Slideshow with comments | November 5, 2007, 3:15 pm |
| Need site critique/comments | January 10, 2007, 10:13 am |
| Handling Erroneous HTML Comments | January 23, 2006, 4:05 am |
| modernizing static pages to allow comments | January 26, 2008, 4:17 pm |
| Viewable in Any Browser campain: comments, feedback | September 29, 2005, 12:28 pm |
| recomendation for a news/articles site, using categories, with comments and forum... | April 12, 2006, 7:31 pm |
|