Click here to get back home

Entity References

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Entity References David E. Ross 10-11-2007
---> Re: Entity References Jukka K. Korpel...10-12-2007
  ---> Re: Entity References Jukka K. Korpel...10-13-2007
      ---> Re: Entity References Jukka K. Korpel...10-14-2007
Get Chitika Premium
Posted by Jukka K. Korpela on October 13, 2007, 2:29 am
Please log in for more thread options
Scripsit Stan Brown:

> Fri, 12 Oct 2007 21:59:51 +0300 from Jukka K. Korpela
>> Actually this misbehavior of IE was introduced by IE 7, ...
>> other entity references and character references using hexadecimal
>> notation are not recognized but taken literally.
>
> Do you mean they are not recognized at all, or not recognized unless
> they have the terminating semicolon?

The latter.

I cannot help comparing this issue with Reason #4 in "4 Reasons to Validate
your HTML":

"Reason #4: Netscape 4.0
Netscape 4.0 began requiring the terminating semicolon on entities where
previous versions often had not. For example, some HTML tutorials show their
expertise with &ltP&gtFoo in examples, which Netscape 4.0 shows literally
while previous versions had shown "<P>Foo". Again, valid HTML worked fine in
all versions of Netscape."

http://htmlhelp.com/tools/validator/reasons.html

That change meant that Netscape started to require the termination semicolon
_according to_ HTML specifications. By those specs, &ltP is an entity
reference (though an undefined one), so treating it as &lt;P was an error.
The IE change, on the other hand, requires the semicolon in contexts where
it is optional by the specifications, so the browser is in error, just as it
would be an error not to recognize <TiTlE>, even though this spelling is not
particularly recommendable.

(Hmm... since requiring the semicolon is correct and necessary by _XML_
rules and hence in XHTML - which IE 7 fails to accept except when served as
text/html -, maybe IE 7.1 will require tags and attribute names to be
written in lowercase. :-) Naturally still refusing to do anything useful
with XHTML.)

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/


Posted by David E. Ross on October 13, 2007, 12:51 pm
Please log in for more thread options
On 10/12/2007 11:29 PM, Jukka K. Korpela wrote [in part]:
> Scripsit Stan Brown:
>
>> Fri, 12 Oct 2007 21:59:51 +0300 from Jukka K. Korpela
>>> Actually this misbehavior of IE was introduced by IE 7, ...
>>> other entity references and character references using hexadecimal
>>> notation are not recognized but taken literally.
>> Do you mean they are not recognized at all, or not recognized unless
>> they have the terminating semicolon?
>
> The latter.
>
> I cannot help comparing this issue with Reason #4 in "4 Reasons to Validate
> your HTML":
>
> "Reason #4: Netscape 4.0
> Netscape 4.0 began requiring the terminating semicolon on entities where
> previous versions often had not. For example, some HTML tutorials show their
> expertise with &ltP&gtFoo in examples, which Netscape 4.0 shows literally
> while previous versions had shown "<P>Foo". Again, valid HTML worked fine in
> all versions of Netscape."

The quoted example is one in which Section 5.3 of the HTML 4.01
specification requires semi-colons because the entity references are NOT
followed by a blank space, line-end, or tag. After all, is the first
entity reference &lt or is it &ltP? Is the second &gt or &gtFoo?
Section 5.3 strongly implies that the semi-colon is optional only when
the entity reference can be decoded unambiguously without it.

The situation that prompted my original post that started this thread
had a blank after the entity reference, thus providing the necessary
unambiguity. The W3C validator at <http://validator.w3.org/> accepted
it as valid.

I hand-code all my HTML. I (almost) always code entity references with
semi-colons. I consider this one instance to be a typo. I did not
catch it via the W3C validator because it was indeed valid. I did not
catch it while proof reading the rendered Web page because SeaMonkey
correctly rendered it. I only caught it when a friend using IE asked
what &mdash is; IE failed to correctly render it and instead exposed it.

I started this thread merely to warn others that they should NOT treat
the semi-colon as optional.

--
David E. Ross
<http://www.rossde.com/>

Natural foods can be harmful: Look at all the
people who die of natural causes.

Posted by Stan Brown on October 14, 2007, 7:11 pm
Please log in for more thread options
Sat, 13 Oct 2007 09:29:50 +0300 from Jukka K. Korpela
> Scripsit Stan Brown:
>
> > Fri, 12 Oct 2007 21:59:51 +0300 from Jukka K. Korpela
> >> Actually this misbehavior of IE was introduced by IE 7, ...
> >> other entity references and character references using hexadecimal
> >> notation are not recognized but taken literally.
> >
> > Do you mean they are not recognized at all, or not recognized unless
> > they have the terminating semicolon?
>
> The latter.

Kiitos!


--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/2003/05/05/why_we_wont_help_you

Posted by André Gillibert on October 12, 2007, 6:10 pm
Please log in for more thread options
David E. Ross wrote:

> However, a friend using Internet Explorer sent me an E-mail asking what
> &mdash is. Thus, another IE failure to comply with the HTML 4.01
> Specification is demonstrated, as is the importance of the strong
> recommendation in the Specification.

Unfortunately, no interative web browser is conforming to the HTML 4.01
specification, and I don't know any of them that even tries to get a
correct conformance.
CDATA marked sectop,s is very poorly supported. It's partially supported
by Opera.
The RCDATA, IGNORE, INCLUDE and TEMP status keywords are supported by no
browser I'm aware of.
Overriding the document type declaration internal subset is supported by
zero browser I'm aware of, though, the HTML 4.01 specification is unclear
and seems to conflict with SGML on this point.
FPI in the document type declaration are rarely neutral on the
interpretation of the document, even if the SGML specification *requires*
that the following document type declarations be equivalent:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
Or:
<!DOCTYPE html [
<!ENTITY % html-d PUBLIC "-//W3C//DTD HTML 4.01//EN">
%html-d;
]>

BTW, why did the W3C remove the version attribute of the html element in
HTML 4? Did they thought that the FPI in a document type declaration could
be used to identify the document version? Did they not read the SGML spec
at all? Not even one chapter?

As comments mustn't affect semantics, the means to build the document type
definition mustn't affect semantics.
(Moreover, the document type declaration is optional, if the document is
in canonicized form).

Unfortunately, HTML5 seems to be going in the sense: Don't support
features that aren't supported by cr*ppy tools so that cr*ppy tools don't
have to be changed to be conforming to HTML5.
I think that the cr*ppiness of the HTML specification is partially
responsible of the cr*ppiness of tools.
--
If you've a question that doesn't belong to Usenet, contact me at

Posted by Jukka K. Korpela on October 13, 2007, 9:16 am
Please log in for more thread options
Scripsit André Gillibert:

> BTW, why did the W3C remove the version attribute of the html element
> in HTML 4?

They didn't remove it but deprecated it:
"The value of this attribute specifies which HTML DTD version governs the
current document. This attribute has been deprecated because it is redundant
with version information provided by the document type declaration."
http://www.w3.org/TR/html401/struct/global.html#adef-version

I don't think I've ever seen <html version="..."> in the wild, so it doesn't
really matter. It would matter if user agents had actually used it for
something.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/


Similar ThreadsPosted
When did IE stop recognizing entity references without ";"? October 11, 2005, 10:51 am
Entity reference Difference? July 27, 2007, 3:47 pm
Browser support for internal entity definitions April 4, 2006, 5:22 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap