Click here to get back home

garbage characters are now on the site, although they weren't there originally

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
garbage characters are now on the site, although they weren't there originally Lawrence Krubner 06-05-2008
Posted by Lawrence Krubner on June 5, 2008, 4:16 pm
Please log in for more thread options


Once upon a time, there were no garbage characters on this page:

http://www.teamlalala.com/blog/category/css/

Now there are. For instance:

The 2nd paragraph from page 114 of “The Zen Of CSS Design”


For me, there are garbage characters before "The" and after "Design".

The page has always, always been served as UTF-8.

I'm having trouble what might have changed, which would cause these
garbage characters. At a stretch, I think back to an incident a few
months ago, when our server was hacked, and we had to do a re-install,
with upgraded versions of stuff like Apache. So I could almost imagine
Apache sending new headers, except that, in my case, the meta tag
indicates UTF-8 and when I look at it in FireFox, FireFox correctly
reads it as UTF-8.

Anything else that could cause this?

I can not find a character encoding that renders this page without
garbage characters.

-- lawrence krubner

Posted by Ben C on June 5, 2008, 4:36 pm
Please log in for more thread options
>
>
> Once upon a time, there were no garbage characters on this page:
>
> http://www.teamlalala.com/blog/category/css/
>
> Now there are. For instance:
>
> The 2nd paragraph from page 114 of “The Zen Of CSS Design”
>
>
> For me, there are garbage characters before "The" and after "Design".
>
> The page has always, always been served as UTF-8.
>
> I'm having trouble what might have changed, which would cause these
> garbage characters. At a stretch, I think back to an incident a few
> months ago, when our server was hacked, and we had to do a re-install,
> with upgraded versions of stuff like Apache. So I could almost imagine
> Apache sending new headers, except that, in my case, the meta tag
> indicates UTF-8 and when I look at it in FireFox, FireFox correctly
> reads it as UTF-8.
>
> Anything else that could cause this?
>
> I can not find a character encoding that renders this page without
> garbage characters.

The page _is_ valid UTF-8, and the server header says it's UTF-8, and it
really does contain those characters (a with circumflex, euro symbol, oe
diphthong ligature thing), encoded in UTF-8.

How did they get there? Not sure, perhaps you "converted" the file from
Latin1 to UTF-8 when it already was UTF-8 or something.

Anyway you should be OK if you just fix the page to contain instead the
UTF-8 representations of the characters you want (presumably quotation
marks).

Never mind the meta tag-- the browser only uses that if the server fails
to say what the encoding is. In your case the server is. The meta tag
might as well be correct, but it won't cause or solve a real problem
here.

Posted by Rik Wasmus on June 5, 2008, 6:20 pm
Please log in for more thread options
On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
> Once upon a time, there were no garbage characters on this page:
>
> http://www.teamlalala.com/blog/category/css/
>
> Now there are. For instance:
>
> The 2nd paragraph from page 114 of “The Zen Of CSS Design�
>
>
> For me, there are garbage characters before "The" and after "Design".
>
> The page has always, always been served as UTF-8.
>
> I'm having trouble what might have changed, which would cause these
> garbage characters. At a stretch, I think back to an incident a few
> months ago, when our server was hacked, and we had to do a re-install,
> with upgraded versions of stuff like Apache. So I could almost imagine
> Apache sending new headers, except that, in my case, the meta tag
> indicates UTF-8 and when I look at it in FireFox, FireFox correctly
> reads it as UTF-8.
>
> Anything else that could cause this?
>
> I can not find a character encoding that renders this page without
> garbage characters.

Among the top reasons for double utf-8 encoding is an improper database
export/import.
--
Rik Wasmus
...spamrun finished

Posted by Lawrence Krubner on June 7, 2008, 7:44 pm
Please log in for more thread options
Rik Wasmus wrote:
> On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
>> Once upon a time, there were no garbage characters on this page:
>>
>> http://www.teamlalala.com/blog/category/css/
>>
>> Now there are. For instance:
>>
>> The 2nd paragraph from page 114 of “The Zen Of CSS Design�
>>
>>
>> For me, there are garbage characters before "The" and after "Design".
>>
>> The page has always, always been served as UTF-8.
>>
>> I'm having trouble what might have changed, which would cause these
>> garbage characters. At a stretch, I think back to an incident a few
>> months ago, when our server was hacked, and we had to do a re-install,
>> with upgraded versions of stuff like Apache. So I could almost imagine
>> Apache sending new headers, except that, in my case, the meta tag
>> indicates UTF-8 and when I look at it in FireFox, FireFox correctly
>> reads it as UTF-8.
>>
>> Anything else that could cause this?
>>
>> I can not find a character encoding that renders this page without
>> garbage characters.
>
> Among the top reasons for double utf-8 encoding is an improper database
> export/import.

That must be it, then. Is there an automated way to undo the damage? Or
do I have to fix every post by hand?

Also, any tips on import/export, for the next time I have to do this?

--lk




Posted by Keith Hughitt on June 10, 2008, 10:42 am
Please log in for more thread options
> Rik Wasmus wrote:
> > On Thu, 05 Jun 2008 22:16:08 +0200, Lawrence Krubner
> >> Once upon a time, there were no garbage characters on this page:
>
> >>http://www.teamlalala.com/blog/category/css/
>
> >> Now there are. For instance:
>
> >> The 2nd paragraph from page 114 of =C3=A2=E2=82=AC=C5=93The Zen Of CSS =
Design=C3=A2=E2=82=AC=EF=BF=BD
>
> >> For me, there are garbage characters before "The" and after "Design".
>
> >> The page has always, always been served as UTF-8.
>
> >> I'm having trouble what might have changed, which would cause these
> >> garbage characters. At a stretch, I think back to an incident a few
> >> months ago, when our server was hacked, and we had to do a re-install,
> >> with upgraded versions of stuff like Apache. So I could almost imagine
> >> Apache sending new headers, except that, in my case, the meta tag
> >> indicates UTF-8 and when I look at it in FireFox, FireFox correctly
> >> reads it as UTF-8.
>
> >> Anything else that could cause this?
>
> >> I can not find a character encoding that renders this page without
> >> garbage characters.
>
> > Among the top reasons for double utf-8 encoding is an improper database
> > export/import.
>
> That must be it, then. Is there an automated way to undo the damage? Or
> do I have to fix every post by hand?
>
> Also, any tips on import/export, for the next time I have to do this?
>
> --lk

Somewhat off-topic question, but, when you copy-and-paste text in
windows/unix, is the encoding included in that information?
I.e. if you saved a document in latin1 and wanted to get it to utf-8,
could you just coipy and paste the text into a new document
and save it as utf-8?

Similar ThreadsPosted
UTF-8 garbage characters October 1, 2004, 1:34 am
UTF-8 garbage characters May 27, 2005, 1:19 pm
web site to translate spanish characters into corresponding html codes? April 14, 2008, 10:16 pm
UTF-8 and Latin-1 characters July 11, 2004, 10:04 pm
& special characters May 29, 2006, 6:48 am
Accented characters March 13, 2008, 1:37 am
Recommendations for good Web site hosting company (for personal Web site) April 16, 2006, 7:49 pm
Special characters used in Excel January 28, 2005, 4:29 am
Bad characters error, help requested October 19, 2006, 10:48 am
Re: Bad characters error, help requested October 19, 2006, 8:34 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap