character entities, is event handler script or html context?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I have an element with an event handler containing javascript within
which is a get type url with multiple params ...

Something like:

<input type="button" name="somename" id="someid" value="Click me"
onclick="object.method('http://host.domain/path/file?p1=v1&mp ;p2=v2');">

From the point of view of '&amp;' vs '&', is the context of the
ampersand in the url javascript or html?

I think it's in an html context, and '&amp;' seems to work there, but at
the back of my mind is this nagging doubt that the event handler is
always script, so script context should apply, and I can't seem to find
a definitive and authoritative answer to this specific "'&' vs '&amp;'"
issue (which may be poor google-fu on my part)

In a script element (ie between <script type='text/javascript'> and
</script> tags) I wouldn't normally use a character entity for an
ampersand, either in a string value or in a logical and or bitwise and

eg I would write:

<script type='text/javascript'>
function dothings(a,b)
if (a & 32 && b & 4)
  {'http://server.domain/path/file?p1=v1&p2=v2 ');

rather than:

<script type='text/javascript'>
function dothings(a,b)
if (a &amp; 32 &amp;&amp; b &amp; 4)
  {'http://server.domain/path/file?p1=v1&amp ;p2=v2');

but I can't convince myself that the same principle applies, or does not
apply, to javascript code embedded "in-line" in an html element


Denis McMahon

p.s. yes, I know this question might be equally well directed to
comp.lang.javascript, and I've spent over an hour debating which group
to post in, knowing that whichever group I post it to, someone will
suggest it should be in the other one, and that if I dare to xpost it to
both, with followups to one or the other, I'll just get more aggro. So
in the end I used <p
and c.i.w.a.h "won"!

Re: character entities, is event handler script or html context?

On 15 Oct 2010, the varmint Denis McMahon

Quoted text here. Click to load it
Without being normative about it, your "feelings" are correct.  &amp;
is used solely to prevent html from considering something untoward
starting with an "&" that is not an entity.

Neredbojias / /

Re: character entities, is event handler script or html context?

On 10/15/2010 07:08 PM, Denis McMahon wrote:
Quoted text here. Click to load it
  An URL is an URL. There is no javascript-ness or html-ness.
  To be useful in a web page, the URL must be present as HTML markup. As
such, the & -> &amp; is correct.
  Unless, of course, the URL never passes through a HTML parser.

James Moe
jmm-list at sohnen-moe dot com

Re: character entities, is event handler script or html context?

Denis McMahon wrote:

Quoted text here. Click to load it

Both, but primarily, in the syntax definitions and in the process of
parsing, it is HTML. In the case above, &mp; is an undefined entity
reference - in practice, an error, which browsers typically handle by
interpreting it literally, which is what you want. But the markup is still

You probably meant to write &mp; instead of &amp;, which is correct.

Quoted text here. Click to load it

It is always a script, but as first parsed by HTML rules, it does not
contain any defined entity references - the &amp; becomes &.

Quoted text here. Click to load it

<!ENTITY % Script "CDATA" -- script expression -->
<!ENTITY % events
 "onclick     %Script;       #IMPLIED  -- a pointer button was clicked --

CDATA concept is loosely described at
which says: "CDATA is a sequence of characters from the document character
set and may include character entities."

Quoted text here. Click to load it

That's more tricky. In HTML 4.01, the content model of the script element is
CDATA, but this means something different here. The above-mentioned part of
the HTML 4.01 specification says:

"Although the STYLE and SCRIPT elements use CDATA for their data model, for
these elements, CDATA must be handled differently by user agents. Markup and
entities must be treated as raw text and passed to the application as is.
The first occurrence of the character sequence "</" (end-tag open delimiter)
is treated as terminating the end of the element's content. In valid
documents, this would be the end tag for the element."

But in XHTML, things are differerent: the content model id #PCDATA ("parsed
character data"), which means that "&" and "<" should be "escaped".

The morale? Use external scripts, where the problem does not occur.


Re: character entities, is event handler script or html context?

Denis McMahon wrote:

Quoted text here. Click to load it


Quoted text here. Click to load it

There is no such thing as a "get type url"; lose that notion.  A URL is a
URL, period.  Especially in the HTTP GET command you will not find a
parameter starting with `http://'.  What you might have referred to as a
"not-get-type-url" until now, is a URI-reference (e.g., `/foo').  See also
RFC 3986.

Quoted text here. Click to load it

Probably you meant `&amp;' here.

Quoted text here. Click to load it

Neredbojias' answer is somewhat correct, but it is not to the point and
incomplete (and it is needlessly a full quote of your posting).

So, to answer your question concisely and to the point: Both.

Long answer:

There are application layers, each with its own context.  Client-side, first
the source code in the event-handler attribute must be parsed by a markup
parser (to be recognized for what it is), then it can be passed to a script
engine (if that is present and enabled, and registered for the content

The type of the value of the event-handler attribute here is CDATA
(character data); in the DTD (document type definition):

|   %attrs;                              -- %coreattrs, %i18n, %events --

| <!ENTITY % attrs "%coreattrs; %i18n; %events;">

| <!ENTITY % events
|  "onclick     %Script;       #IMPLIED  -- a pointer button was clicked --

| <!ENTITY % Script "CDATA" -- script expression -->

Or, in the Specification prose:

| onclick = script [CT]

| 6.14 Script data
| Script data ( %Script; in the DTD) can be the content of the SCRIPT
| element and the value of intrinsic event attributes. User agents must not
| evaluate script data as HTML markup but instead must pass it on as data to
| a script engine.
| The case-sensitivity of script data depends on the scripting language.
| Please note that script data that is element content may not contain
| character references, but script data that is the value of an attribute
| may contain them. The appendix provides further information about
| specifying non-HTML data.

In essence: In CDATA (character data) *attribute* values you may use entity
_references_ starting with `&'.  But an entity reference does not need to be
ended with `;', so if you do not escape `&' as `&amp;', following non-
whitespace characters would be regarded part of the (unwanted) reference
(and resolving that reference would fail.)

So the following, although it might look weird at first, is a Valid and
executable (X)HTML fragment:

  <a onclick="if (1&lt;&lt;2&gt;1) window.alert(&quot;Qapla&#39;&quot;)"

(spaces removed to get it in one line)

The `onclick' attribute value translates to

  if (1<<2>1) window.alert("Qapla'")

for the ECMAScript-conforming script engine, which parses it (per precedence
rules defined by the grammar) as

  if ((1 << 2) > 1)

(As you can see here, you CAN use character [entity] references if
characters in the attribute value would conflict with the attribute value
delimiters, and you MUST use them if they would be misrecognized for markup
characters.  Note that you do not need to escape `>', but symmetry can be
useful there.)
Quoted text here. Click to load it

Again, first the source code must be parsed by a markup parser, then it can
be passed to a script engine.

But here it depends on the markup language: HTML or XHTML?

In *HTML*, the content model of the SCRIPT element is CDATA:

|  <!ELEMENT SCRIPT - - %Script;          -- script statements -->

(see above for `%Script;')

In CDATA *element* content, however, `&' is _not_ considered the start of an
entity reference.  The only markup characters that should be considered are
`<' and `/' when they occur next to each other (`</', End Tag Open [ETAGO]
delimiter, to end the SCRIPT element, which is why you would want to escape
that e.g. with `<\/').  So, both your first and second SCRIPT element would
be Valid HTML fragments, but only the second one would run (as `&amp;' is a
*script* syntax error.)

In *XHTML*, the content model of the `script' element is PCDATA (parsed
character data) instead:

| <!-- script statements, which may include CDATA sections -->
| <!ELEMENT script (#PCDATA)>

Meaning that *all* markup characters (in particular, `<', `/', `>', `&',
`[', and `]') are considered in the element content.  So, your first script
would not be a Valid XHTML fragment, and would not be passed by an XML
parser to the script engine to begin with, your second one would, and the
second one would run (since the script engine would get the original `&amp;'
as `&' from the markup parser, which is script-syntactically valid.)

There are solutions to this problem in XHTML.  The first is CDATA sections:

  <script type='text/javascript'>
    function dothings(a,b)
      if (a & 32 && b & 4)
      {'http://server.domain/path/file?p1=v1&p2=v2 ');

By declaring the former PCDATA content CDATA, `&' is no longer considered a
markup character.  (This solution has the drawback that it is not compatible
with script engines of HTML tag-soup parsers (as they pass the code
verbatim).  So if you need that, too, you would use script comments to hide
potentially verbatim passed delimiters that would constitute a script syntax

  // <![CDATA[
  // ]]>

The other, recommended, solution, is to move the offending script code to an
external resource that is _not_ a markup document:

  <script type="text/javascript" src="external.js"></script>

See also: <

Quoted text here. Click to load it

A crosspost to comp.lang.javascript would have been acceptable.  However, we
(especially I) do have discussed this there many times before, if not ad
nauseam.  I wonder how you could have missed it.

var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
)  // Plone, register_function.js:16

Site Timeline