Click here to get back home

Newlines in URIs

 HomeNewsGroups | Search

comp.infosystems.www.authoring.html - discuss HTML authoring here 

get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Newlines in URIs Tristan Miller 02-08-2010
---> Re: Newlines in URIs Jonathan N. Lit...02-08-2010
---> Re: Newlines in URIs Jukka K. Korpel...02-08-2010
Posted by Tristan Miller on February 8, 2010, 10:17 am
Please log in for more thread options


Greetings.

In XHTML, is it legal to have newlines in attribute values which take a
URI, such as the href attribute of an anchor?

If so, say I have an anchor as follows:

<a href="foo?bar=bar
show/hide quoted text

Would a conforming user agent interpret the URI as "foo?bar=bar&baz=baz" or
as "foo?bar=bar%0Abaz=baz" or possibly something else?

I am asking because JMeter appears to choke on such URIs with a
"java.net.MalformedURLException: Illegal character in URL". I wonder if
it's being overly finicky (since web browsers seem to cope with such URIs
just fine) or if it's right to reject such constructions. I haven't been
able to find anything in the HTML 4.01 spec about this, though maybe I'm
overlooking something.

Regards,
Tristan

--
_
show/hide quoted text

Posted by Jonathan N. Little on February 8, 2010, 11:03 am
Please log in for more thread options


Tristan Miller wrote:
show/hide quoted text


Since newline characters are ASCII control characters which are not
allowed in URI's...

http://www.ietf.org/rfc/rfc2396.txt

show/hide quoted text
2.4.3. Excluded US-ASCII Characters

Although they are disallowed within the URI syntax, we include here a
description of those US-ASCII characters that have been excluded and
the reasons for their exclusion.

The control characters in the US-ASCII coded character set are not
used within a URI, both because they are non-printable and because
they are likely to be misinterpreted by some control mechanisms.
show/hide quoted text

...how the UA handles them is irrelevant. It's invalid. Don't use
newlines in href attributes.

--
Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com

Posted by Tristan Miller on February 8, 2010, 12:09 pm
Please log in for more thread options


Greetings.

wrote:
show/hide quoted text

I think we need to make a distinction what is a valid URI, and what is a
valid value for the href attribute. It is known that the user agent is
expected to modify the href attribute in order to produce a URI; in my
original example, the value of the href attribute contains "&amp;" but this
is interpreted by the user agent as "&" for the purposes of constructing
the URI. For all I know the (X)HTML standard instructs user agents to
ignore newlines in href attributes. That's what I'm asking about here.

Regards,
Tristan

--
_
show/hide quoted text

Posted by Greg Russell on February 8, 2010, 12:41 pm
Please log in for more thread options




...
show/hide quoted text

I think you need to understand the distinction between a control character
and a printable character.



Posted by Jukka K. Korpela on February 8, 2010, 1:32 pm
Please log in for more thread options


Tristan Miller wrote:

show/hide quoted text

Define "legal". It is well-formed and valid, but incorrect.

By HTML rules, including XHTML, a newline is equivalent to a space, except
in some specific contexts (which do not include attribute values). XHTML 1.0
says it thusly:

"When user agents process attributes, they do so according to Section 3.3.3
of [XML]:
- Strip leading and trailing white space.
- Map sequences of one or more white space characters (including line
breaks) to a single inter-word space."
http://www.w3.org/TR/xhtml1/#h-4.7

show/hide quoted text

It would interpret the tag as equivalent to
show/hide quoted text

The attribute value would thus be recognized as
foo?bar=bar &baz=baz
since the &amp; entity reference exists at the (X)HTML source level only.

Anyway, this value does not constitute a URL (or URI, if you prefer)
according to URL specifications, since a URL must not contain the space
character. The URL specification (RFC 3986) doesn't really say this
explicitly; rather, it follows from the definitions of allowed characters.

This is outside the scope on (X)HTML parsing and validity but well withing
the scope of (X)HTML as a whole, as the non-formalized part of HTML
specifications say that the href attribute value shall be a URL and
normatively cites some specifications on URL format.

(without the quotati

show/hide quoted text

Its reaction is thus basically correct, though the error message is somewhat
obscure. The space is "illegal" only in the sense of not being allowed by
URL syntax,

show/hide quoted text

Modern browsers generally strip newlines when processing attribute values as
URLs. This violates the specification, but since such attributes are
incorrect anyway, we might take this as error handling

--
Yucca, http://www.cs.tut.fi/~jkorpela/


Similar ThreadsPosted
ampersand character in URIs December 30, 2007, 5:35 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Driving a better car - Fuelzilla.com

Cabling site for homeowners and pros alike - Cabling-Design.com

Friends:

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap
Privacy Policy