Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Jukka K. Korpela
February 15, 2006, 5:43 pm
Re: Mark up compound noun so that search engines see two words
You're right; I should try to remember that I don't remember all
Unicode characters yet. (And I really _should_ remember correctly what
U+FEFF is. [Slaps himself.])
The defined meaning of U+FEFF is that it is a) a byte order mark (BOM),
b) an invisible control character for preventing a line break, and in
the latter role, U+2060 WORD JOINER is preferred. This means, in
effect, that by Unicode recommendations, U+FEFF should only be used at
the start of a text file as BOM.
This is somewhat theoretic of course, since U+2060 is poorly supported.
Besides, HTML specifications do not require that Unicode semantics be
obeyed; on the other hand, this means that the effect of U+FEFF in an
HTML document is _undefined_.
What you are really saying by using kinder﻿garten is that the
word "kindergarten" be not divided into its components in word
division. This has little effect at present, since browsers don't do
So in that sense, it might be a harmless trick in an attempt to make
indexing robots treat the construct as two words. However, we have no
guarantee that this actually happens (after all, search engines _could_
be Unicode-aware and treat a word with prevented line break inside as
very much a single word).
Some user agents will choke on ﻿. Such user agents are rare
these days, but before taking a risk, I would like to see that
something can possibly be gained. If the split into components is
natural (and "kinder" and "garten" is not, for English text), then it
would be better to _use_ the component words in natural sentences as
healthy, natural food for search engines. If it isn't, the whole trick
is probably quite pointless; nobody is going to search for "kinder" and
"garten" if he wants to find info on kindergartens.
Yucca, http://www.cs.tut.fi/~jkorpela /
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html