|
Posted by Andy Dingley on June 2, 2006, 5:16 am
Please log in for more thread options
What specifies the permitted root element(s) for a document ? HTML,
SGML, XHTML or XML ?
Valid HTML documents need to have a well-known DTD and a doctypedecl in
each document like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
The document's root element is "HTML", and is specified by the
doctypedecl. For HTML and XHTML it's possible that the prose of their
recommendation restricts it too.
My question is, is there any way to author a non-HTML DTD (SGML or XML)
so as to restrict valid documents to only allow a certain subset of
their elements to be used as the root element? Can this restriction be
expressed _entirely_ within a DTD? Is this used within the HTML DTDs ?
(i.e. not just in the doctypedecl)
Is this fragment a valid HTML document ? If not, why isn't it? Just
which part of its definition is forbidding this fragmentary use?
<!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"> <div>
<p>Foo</p>
</div>
Good tutorial refs on DTDs are also welcome. I don't know anything like
enough on DTD innards.
Thanks
|
|
Posted by Peter Flynn on June 2, 2006, 7:26 pm
Please log in for more thread options
> What specifies the permitted root element(s) for a document ? HTML,
> SGML, XHTML or XML ?
When using a DTD, any declared element type can be the root element.
It must be specified in the Document Type Declaration in the XML file.
The same is true for SGML, HTML, XHTML eg
<!DOCTYPE table PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN">
specifies a document starting with <table> and containing anything
valid in HTML 4.01 tables.
Warning: *browsers* are not SGML conforming applications, so they won't
understand this. They *will* understand if you use XML or XHTML, but
I don't know what their reaction to a XHTML fragment would be.
> My question is, is there any way to author a non-HTML DTD (SGML or XML)
> so as to restrict valid documents to only allow a certain subset of
> their elements to be used as the root element?
Yep, just use the element type name of your choice in the Document
Type Declaration. This is required to be supported by all conforming
editors using a DTD. If you use a Schema, all bets are off, as the
specification of a root element type is done quite differently there.
> Can this restriction be
> expressed _entirely_ within a DTD?
No, not at all. *Any* element type of a DTD can be used as the root
element type.
But conforming applications (eg editors) usually make a good guess
if they are worth anything, when they parse the DTD -- it's not
hard for them to spot that at least one element type is never used
in the content model of any other element type, and is therefore a
good choice for a default root element type. Oddly, some otherwise
very good editors fail to do this, possibly because their programmers
simply didn't grok XML markup.
> Is this used within the HTML DTDs ?
> (i.e. not just in the doctypedecl)
Not explicitly.
> Is this fragment a valid HTML document ?
Yes, perfectly.
> If not, why isn't it? Just
> which part of its definition is forbidding this fragmentary use?
> <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
> <div>
> <p>Foo</p>
> </div>
You can test this by running it through any SGML validating parser
(eg nsgmls).
> Good tutorial refs on DTDs are also welcome. I don't know anything like
> enough on DTD innards.
The best by far is still Eve Maler and Jeanne El Andaloussi, "Developing
SGML DTDs -- from text to model to markup", Prentice Hall, 1996. You
just have to skip the bits which refer to those parts of SGML which were
dropped in the XML Specification (see the list in the FAQ on converting
DTDs to XML at http://xml.silmaril.ie/developers/dtdconv/).
But you should also bone up on Relax NG, which is a schema language with
a short (DTD-like) syntax as well as a verbose syntax, from which you
can generate DTDs, W3C Schemas, and more. This may be an easier way into
document modelling.
///Peter
--
XML FAQ: http://xml.silmaril.ie/
|
|
Posted by Jukka K. Korpela on June 3, 2006, 1:00 am
Please log in for more thread options
>> Is this fragment a valid HTML document ?
>
> Yes, perfectly.
No, it is a valid SGML document, but it is not an HTML document, as defined
in HTML specifications. (Of course, most "HTML documents" on the Web are not
HTML documents in that sense, but the question is meaningful only if
interpreted as relating to specifications. "HTML document" in the loose
sense - as well as "XML document" when well-formedness is not required - is
far too fuzzy a concept to be argued about.)
>> If not, why isn't it? Just
>> which part of its definition is forbidding this fragmentary use?
>> <!DOCTYPE div PUBLIC "-//W3C//DTD HTML 4.01//EN"
>> "http://www.w3.org/TR/html4/strict.dtd">
>> <div>
>> <p>Foo</p>
>> </div>
>
> You can test this by running it through any SGML validating parser
> (eg nsgmls).
That would indicate the validity, but the HTML 4.01 specification requires
that one of three specific DOCTYPE declarations be used - not just that one
of three DTDs be used. And this isn't one of them. Moreover, the
specification explicitly says:
"After document type declaration, the remainder of an HTML document is
contained by the HTML element."
http://www.w3.org/TR/REC-html40/struct/global.html#h-7.3
--
Yucca, http://www.cs.tut.fi/~jkorpela/
|
|
Posted by Joe Kesselman on June 3, 2006, 1:11 am
Please log in for more thread options
In other words: As always, a DTD -- or a schema -- is only a partial
description of what makes a document correct and meaningful. Think of
these as "higher-level syntax checking"; the application is always going
to impose semantic constraints as well.
Having the schema or DTD describes the document's structure in a
machine-readable form that tools can take advantage of, so they don't
have to do *all* the checking themselves. That's valuable. But don't
expect it to be complete.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
|
|
Posted by Jukka K. Korpela on June 3, 2006, 1:46 am
Please log in for more thread options
> In other words:
In future, please quote or paraphrase the message that you are commenting
on.
>As always, a DTD -- or a schema -- is only a partial
> description of what makes a document correct and meaningful.
It depends on. There's no law that requires additional rules, though pure
syntax as such _is_ somewhat boring.
> Think of
> these as "higher-level syntax checking"; the application is always
> going to impose semantic constraints as well.
What's "higher-level" here? Anyway, in the issue discussed in this thread,
it is the additional _syntactic_ constraints that imply that a certain kind
of document is not an HTML document. There's nothing semantic in the
requirement that a document contain a specific DOCTYPE declaration or that a
document contain a <title> element. (Requiring that the <title> element
contain text that is a descriptive name for the document, especially for use
as a title for it in different contexts, would be a semantic requirement.
Whether HTML specifications make such a requirement is debatable; the prose
in the specs is a mixture of normative-looking prose, comments, hints,
wishful thinking, etc.)
--
Yucca, http://www.cs.tut.fi/~jkorpela/
|
| Similar Threads | Posted | | Can TR element be direct child node of TABLE element? | February 13, 2005, 12:17 pm |
| Any downside to root-relative paths? | October 10, 2007, 5:52 am |
| Can't Access Image Above Project Root | February 25, 2008, 10:30 am |
| Element "P" IS open! | March 18, 2005, 7:22 am |
| Need element from webpage | September 13, 2008, 1:56 am |
| Possibly OT - would like to have an element back | March 26, 2005, 9:46 pm |
| Please help me find my unclosed element | January 5, 2006, 10:30 pm |
| Element "x" undefined when using validator | February 10, 2006, 2:16 pm |
| acclerator key with input element | March 23, 2006, 8:14 am |
| layout element positioning using CSS... | June 7, 2006, 4:04 pm |
|