|
Posted by Tristan Miller on February 8, 2010, 10:17 am
Please log in for more thread options
Greetings.
In XHTML, is it legal to have newlines in attribute values which take a
URI, such as the href attribute of an anchor?
If so, say I have an anchor as follows:
<a href="foo?bar=bar
show/hide quoted text
&baz=baz">
Would a conforming user agent interpret the URI as "foo?bar=bar&baz=baz" or
as "foo?bar=bar%0Abaz=baz" or possibly something else?
I am asking because JMeter appears to choke on such URIs with a
"java.net.MalformedURLException: Illegal character in URL". I wonder if
it's being overly finicky (since web browsers seem to cope with such URIs
just fine) or if it's right to reject such constructions. I haven't been
able to find anything in the HTML 4.01 spec about this, though maybe I'm
overlooking something.
Regards,
Tristan
--
_
show/hide quoted text
_V.-o Tristan Miller >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=- <> In a haiku, so it's hard
(7_\ http://www.nothingisreal.com/ >< To finish what you
|
|
Posted by Jonathan N. Little on February 8, 2010, 11:03 am
Please log in for more thread options
Tristan Miller wrote:
show/hide quoted text
> Greetings.
> In XHTML, is it legal to have newlines in attribute values which take a
> URI, such as the href attribute of an anchor?
> If so, say I have an anchor as follows:
> <a href="foo?bar=bar
> Would a conforming user agent interpret the URI as "foo?bar=bar&baz=baz" or
> as "foo?bar=bar%0Abaz=baz" or possibly something else?
> I am asking because JMeter appears to choke on such URIs with a
> "java.net.MalformedURLException: Illegal character in URL". I wonder if
> it's being overly finicky (since web browsers seem to cope with such URIs
> just fine) or if it's right to reject such constructions. I haven't been
> able to find anything in the HTML 4.01 spec about this, though maybe I'm
> overlooking something.
Since newline characters are ASCII control characters which are not
allowed in URI's...
http://www.ietf.org/rfc/rfc2396.txt
show/hide quoted text
<cite>
2.4.3. Excluded US-ASCII Characters
Although they are disallowed within the URI syntax, we include here a
description of those US-ASCII characters that have been excluded and
the reasons for their exclusion.
The control characters in the US-ASCII coded character set are not
used within a URI, both because they are non-printable and because
they are likely to be misinterpreted by some control mechanisms.
show/hide quoted text
</cite>
...how the UA handles them is irrelevant. It's invalid. Don't use
newlines in href attributes.
--
Take care,
Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com
|
|
Posted by Tristan Miller on February 8, 2010, 12:09 pm
Please log in for more thread options
Greetings.
wrote:
show/hide quoted text
> Tristan Miller wrote:
>> Greetings.
>> In XHTML, is it legal to have newlines in attribute values which take a
>> URI, such as the href attribute of an anchor?
>> If so, say I have an anchor as follows:
>> <a href="foo?bar=bar
>> Would a conforming user agent interpret the URI as "foo?bar=bar&baz=baz"
>> or as "foo?bar=bar%0Abaz=baz" or possibly something else?
>> I am asking because JMeter appears to choke on such URIs with a
>> "java.net.MalformedURLException: Illegal character in URL". I wonder if
>> it's being overly finicky (since web browsers seem to cope with such
>> URIs
>> just fine) or if it's right to reject such constructions. I haven't
>> been able to find anything in the HTML 4.01 spec about this, though
>> maybe I'm overlooking something.
>
>
> Since newline characters are ASCII control characters which are not
> allowed in URI's...
>
> http://www.ietf.org/rfc/rfc2396.txt
>
> 2.4.3. Excluded US-ASCII Characters
>
> Although they are disallowed within the URI syntax, we include here a
> description of those US-ASCII characters that have been excluded and
> the reasons for their exclusion.
>
> The control characters in the US-ASCII coded character set are not
> used within a URI, both because they are non-printable and because
> they are likely to be misinterpreted by some control mechanisms.
>
> ...how the UA handles them is irrelevant. It's invalid. Don't use
> newlines in href attributes.
I think we need to make a distinction what is a valid URI, and what is a
valid value for the href attribute. It is known that the user agent is
expected to modify the href attribute in order to produce a URI; in my
original example, the value of the href attribute contains "&" but this
is interpreted by the user agent as "&" for the purposes of constructing
the URI. For all I know the (X)HTML standard instructs user agents to
ignore newlines in href attributes. That's what I'm asking about here.
Regards,
Tristan
--
_
show/hide quoted text
_V.-o Tristan Miller >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=- <> In a haiku, so it's hard
(7_\ http://www.nothingisreal.com/ >< To finish what you
|
|
Posted by Greg Russell on February 8, 2010, 12:41 pm
Please log in for more thread options
...
show/hide quoted text
>> The control characters in the US-ASCII coded character set are
>> not used within a URI, both because they are non-printable and
>> because they are likely to be misinterpreted by some control
>> ...how the UA handles them is irrelevant. It's invalid. Don't use
>> newlines in href attributes.
> I think we need to make a distinction what is a valid URI, and what
> is a valid value for the href attribute. It is known that the user
> agent is expected to modify the href attribute in order to produce a
> URI; in my original example, the value of the href attribute contains
> "&" but this is interpreted by the user agent as "&" for the
> purposes of constructing the URI.
I think you need to understand the distinction between a control character
and a printable character.
|
|
Posted by Jukka K. Korpela on February 8, 2010, 1:32 pm
Please log in for more thread options
Tristan Miller wrote:
show/hide quoted text
> In XHTML, is it legal to have newlines in attribute values which take
> a URI, such as the href attribute of an anchor?
Define "legal". It is well-formed and valid, but incorrect.
By HTML rules, including XHTML, a newline is equivalent to a space, except
in some specific contexts (which do not include attribute values). XHTML 1.0
says it thusly:
"When user agents process attributes, they do so according to Section 3.3.3
of [XML]:
- Strip leading and trailing white space.
- Map sequences of one or more white space characters (including line
breaks) to a single inter-word space."
http://www.w3.org/TR/xhtml1/#h-4.7
show/hide quoted text
> If so, say I have an anchor as follows:
> <a href="foo?bar=bar
> Would a conforming user agent interpret the URI as
> "foo?bar=bar&baz=baz" or as "foo?bar=bar%0Abaz=baz" or possibly
> something else?
It would interpret the tag as equivalent to
show/hide quoted text
<a href="foo?bar=bar &baz=baz">
The attribute value would thus be recognized as
foo?bar=bar &baz=baz
since the & entity reference exists at the (X)HTML source level only.
Anyway, this value does not constitute a URL (or URI, if you prefer)
according to URL specifications, since a URL must not contain the space
character. The URL specification (RFC 3986) doesn't really say this
explicitly; rather, it follows from the definitions of allowed characters.
This is outside the scope on (X)HTML parsing and validity but well withing
the scope of (X)HTML as a whole, as the non-formalized part of HTML
specifications say that the href attribute value shall be a URL and
normatively cites some specifications on URL format.
(without the quotati
show/hide quoted text
> I am asking because JMeter appears to choke on such URIs with a
> "java.net.MalformedURLException: Illegal character in URL".
Its reaction is thus basically correct, though the error message is somewhat
obscure. The space is "illegal" only in the sense of not being allowed by
URL syntax,
show/hide quoted text
> I wonder
> if it's being overly finicky (since web browsers seem to cope with
> such URIs just fine)
Modern browsers generally strip newlines when processing attribute values as
URLs. This violates the specification, but since such attributes are
incorrect anyway, we might take this as error handling
--
Yucca, http://www.cs.tut.fi/~jkorpela/
|
|