Click here to get back home

Notepad and UTF-8

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Notepad and UTF-8 The Bicycling Guitarist 03-08-2008
`--> Re: Notepad and UTF-8 Man-wai Chang T...03-15-2008
Posted by The Bicycling Guitarist on March 8, 2008, 1:15 am
Please log in for more thread options
Okay my web site grew up and is moving to a non-Windows server, Unix. I am
converting my static HTML/CSS files to Drupal content management system. The
leading white spaces I use to indent text for easy editing are not collapsed
by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
environment and ran a sed command to strip whitespace.

When I opened the files in Notepad they were all on one line each. So I
tried copying them from Microsoft FrontPage where they looked okay in HTML
view and pasting them into Notepad then saving over the HTML file. I most
definitely and carefully chose save as UTF-8 from the list of options
offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?

Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks



Posted by Ben C on March 8, 2008, 5:02 am
Please log in for more thread options
> Okay my web site grew up and is moving to a non-Windows server, Unix. I am
> converting my static HTML/CSS files to Drupal content management system. The
> leading white spaces I use to indent text for easy editing are not collapsed
> by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
> environment and ran a sed command to strip whitespace.
>
> When I opened the files in Notepad they were all on one line each. So I
> tried copying them from Microsoft FrontPage where they looked okay in HTML
> view and pasting them into Notepad then saving over the HTML file. I most
> definitely and carefully chose save as UTF-8 from the list of options
> offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?

Not sure what you mean by ANSI. Everything appeared on one line probably
because cygwin sed put Unix line separators (just CR, not CRLF) at the
ends of the lines. You can configure cygwin somehow not to do that, I
think on a per-filesystem basis.

Most editors even on Windows will sort of half-work with just CR, which
is probably why it looked OK in FrontPage but not in Notepad.

> Please tell me there is an easier way... I need to
> a) strip leading whitespace from the content of my html files and
> b) save these files as UTF-8 and have them STAY UTF-8. Thanks

Just don't use Notepad or FrontPage. It could have been the copy and
pasting from FrontPage that messed up the UTF-8.

You could try to set up cygwin to use DOS line endings, or just stick to
Unix line endings. But then you need to be careful because some Windows
editors may open the file silently and apparently OK with the Unix line
endings, but then save DOS line endings on the one or two lines you edit
leaving you with an inconsistent mixture. Without any decent tools it's
often hard to know what you've actually ended up with or why things are
going wrong.

Posted by Andy Dingley on March 8, 2008, 7:45 am
Please log in for more thread options
On Fri, 7 Mar 2008 22:15:27 -0800, "The Bicycling Guitarist"

> I most
>definitely and carefully chose save as UTF-8 from the list of options
>offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?

What's the difference between "ANSI" and "UTF-8" ?

Unless you're actually using any non-ASCII characters, then the files
ought to be identical anyway. You neither require nor want to have a
BOM at the start of the file. Not all UTF-8 files use one, having one
stripped out doesn't stop it being UTF-8.

When using Windows tools with Unicode, watch out that the "Save as
Unicode" option often means UTF-16, which you really don't want. Look
further down for a specific "UTF-8" option.

One of our office standard practices is that all source code is in
UTF-8. Another one is that all source files carry a standard copyright
banner at the top. By placing the Alt-0169 © copyright symbol in there
rather than "(c)" or a HTML entity, I have a clear in-my-face indication
as to whether other developers have broken the encoding, and without
having to check through the entire file.

If you're having linebreak troubles, try using Wordpad rather than
Notepad, or else use a far better text editor, like jEdit.


Posted by The Bicycling Guitarist on March 8, 2008, 9:10 pm
Please log in for more thread options

> On Fri, 7 Mar 2008 22:15:27 -0800, "The Bicycling Guitarist"
>
>
> Notepad, or else use a far better text editor, like jEdit.

Thanks for all the tips, Andy. I have switched to jEdit. What blew my mind
yesterday was finding out that according to some people Notepad "breaks"
UTF-8, and noticing for the first time that even after saving a file as
UTF-8 that the description would change to ANSI when I checked it again
later. I use a lot of HTML character entities in my code for foreign and
special characters.



Posted by Andy Dingley on March 9, 2008, 2:10 pm
Please log in for more thread options
On 9 Mar, 02:10, "The Bicycling Guitarist"

> according to some people Notepad "breaks"
> UTF-8, and noticing for the first time that even after saving a file as
> UTF-8 that the description would change to ANSI when I checked it again
> later.

Hard to comment without an example.

> I use a lot of HTML character entities in my code for foreign and
> special characters.

If you use entities exclusively, you no longer have non-ASCII
characters nor care about the encoding.

Similar ThreadsPosted
Validating UTF8 encoding ... November 4, 2005, 8:11 am
notepad not saving .html September 11, 2004, 7:47 pm
How do I force a browser to open a certain file with notepad ? December 10, 2006, 8:06 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap