Click here to get back home

Multiple coding systems, and filesystems

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Multiple coding systems, and filesystems gentsquash 06-03-2008
Posted by gentsquash on June 3, 2008, 5:08 pm
Please log in for more thread options
On some of my course pages, I quote (with attribution)
small sections of Wikipedia and the like. E.g, the top
of
http://en.wiktionary.org/wiki/entropy

has "entropia" in Greek font,

http://en.wikipedia.org/wiki/Goedel

has the o-umlaut from German, and

http://en.wikipedia.org/wiki/Origami

has a Japanese font. What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?

And can the HTML-page be set up so that it will validate?
====================================================

Actually, I'm ahead of myself. In the past I've cut&pasted
a snippet from, say, wiki/entropy, into an Emacs buffer,
adjoined a "From Wictionary http://..." and attempted to
save the buffer. Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.

If I'm using multiple coding systems on the same webpage,
do I have to save the different snippets in different files
stored with different coding systems, and then

<!--#include ... -->

each of them into one webpage? Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?

FWIW, my home OS is MacOSX and I need to upload my webpages
to school. The math dept. server is probably running
Unix; when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.

Sincerely,
Prof. Jonathan King (gentsquash)
Mathematics dept, Univ. of Florida

Posted by Stanimir Stamenkov on June 4, 2008, 12:37 am
Please log in for more thread options
Tue, 3 Jun 2008 14:08:25 -0700 (PDT), /gentsquash@gmail.com/:

> Or can the file system
> permit a file that simultaneously has Greek, German and
> Japanese characters?

Files generally store bytes. How these bytes will be interpreted is
up to the application reading them. Characters are encoded into
bytes using different coding schemes which generally are capable of
representing the characters of a specific character set. The
Unicode character set generally contains all possible characters so
if you use some UTF (Unicode Transformation Format) variant you can
have all characters you need encoded in a single entity. So make
sure your text editor supports reading/saving files using UTF-8, for
example.

--
Stanimir

Posted by Jukka K. Korpela on June 4, 2008, 3:14 am
Please log in for more thread options
Scripsit gentsquash@gmail.com:

> On some of my course pages, I quote (with attribution)
> small sections of Wikipedia and the like. E.g, the top
> of
> http://en.wiktionary.org/wiki/entropy
>
> has "entropia" in Greek font,

Technically, it has the word in Greek _characters_ (letters). This is
the key issue; fonts are secondary. The page has a style sheet that
makes special suggestions on the font of such words, in a most confusing
and tricky way.

> What is the correct --maybe "coding
> system" is the term?-- so that I could quote all three of
> these on the same HTML page?

The proper _character encoding_ is UTF-8 in such cases. As soon as you
have Japanese, Greek, and umlaut Latin letters on one page, that's
definitely the best option. If there were just a few "special"
characters, you could present them using entity references like &ouml;
or character references like &#261;, but this gets clumsy (or requires
suitable software for generating them) if you have full sentences that
consist of "special" characters.

It's not possible (in practice on web pages) to switch the character
encoding in the middle of an HTML document.

> In the past I've cut&pasted
> a snippet from, say, wiki/entropy, into an Emacs buffer,
> adjoined a "From Wictionary http://..." and attempted to
> save the buffer. Sometimes Emacs asked me for what coding
> system to use --and I don't know how to placate it.

UTF-8, if Emacs can really produce it. The version of Emacs I've been
using does not deal with "special" characters, but I recently looked at
the newest version of Emacs for Windows, and it seems to have an
impressive support to "special" characters.

Note that the server should be configured to send an appropriate HTTP
header. You normally do this by adding something to your .htaccess file,
and in practice you need to use the same encoding for all ".html" files
in a directory (folder), though you could use, for example, ISO-8859-1
for ".html" and UTF-8 for ".htm" files.

> If I'm using multiple coding systems on the same webpage,
> do I have to save the different snippets in different files
> stored with different coding systems, and then
>
> <!--#include ... -->
>
> each of them into one webpage?

No, it won't work that way, even if your server supports SSI includes.
They result in a single document, which can have one encoding only. (I
won't mention <iframe>, because it's really a poor hack for things like
this, but it performs sort-of include where the included document is
displayed "autonomously" inside the main canvas and may have a different
encoding.)

> FWIW, my home OS is MacOSX and I need to upload my webpages
> to school. The math dept. server is probably running
> Unix; when I manipulate the html files (when at work), I'm
> using Emacs running on a Solaris (unix) system.

A nice mess :-) but it should be manageable when using UTF-8. When
uploading with FTP, use binary (not Ascii) mode, since no character
conversion shall be performed - the data is already in a
system-independent encoding.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/


Similar ThreadsPosted
Int. Conf. on Systems Engineering'05 - August 16-18, 2005 January 29, 2005, 3:25 pm
HTML Coding Guidelines May 21, 2005, 8:01 pm
Checking out the correctness of a site's coding October 23, 2004, 10:52 am
Coding javascript and css reliant UI elements October 13, 2005, 10:22 am
Chinese Characters in html (coding on a Mac OS X) February 13, 2006, 10:48 pm
Massive HTML coding errors April 13, 2006, 8:16 pm
Looking for platinum colour code for RGB coding November 11, 2007, 10:36 am
Multiple Page Form September 8, 2004, 6:46 am
multiple mailto syntax November 9, 2004, 4:12 am
multiple languages in one document March 18, 2005, 2:55 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap