Click here to get back home

Problem handling a Unicode file

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Problem handling a Unicode file MoshiachNow 08-28-2006
Get Chitika Premium
Posted by Peter J. Holzer on August 28, 2006, 2:29 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Dr.Ruud on August 28, 2006, 4:28 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by MoshiachNow on August 29, 2006, 2:29 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Dr.Ruud on August 29, 2006, 7:40 am
Please log in for more thread options
MoshiachNow schreef:

> all bytes are interchanged within the words

That is the UTF16-LE order, so it would have been wrong if you would
have seen something else. Do you understand the role of the BOM (Byte
Order Mark) now?
http://en.wikipedia.org/wiki/Byte_Order_Mark

Create a fresh file in Notepad with just the word "test" in it, and do a
File/Save As..., with Encoding "Unicode", and you'll see that Windows
defaults to UTF16-LE.

You'll also find an Encoding "Unicode big-endian" there, that is
UTF16-BE. But why would you want the bytes in a different order than the
default for the platform?

--
Affijn, Ruud

"Gewoon is een tijger."



Posted by MoshiachNow on August 29, 2006, 8:25 am
Please log in for more thread options
HI,

I do run exactly this :
open my $fhi, '<:encoding(UTF-16)', $fni
or die "open '$fni', stopped $!" ;


open my $fho, '>:encoding(UTF-16)', $fno
or die "open '$fno', stopped $!" ;

and expect input and output files to be in the same order,but they are
not.

I DID try adding the following line,it did not help:

print $fho "\x";


Similar ThreadsPosted
Error in Handling Unicode(UTF16-LE) File & String May 6, 2008, 4:00 am
Problem with File Handling? March 20, 2006, 6:23 am
Newbie with simple File handling problem November 14, 2006, 12:52 pm
Unicode-related problem installing perl modules on Solaris 10 February 22, 2007, 11:21 am
problem in POSIX module with handling SIGCHLD July 15, 2005, 8:52 am
File handling and regex November 5, 2007, 11:15 am
Reading Unicode File and Saving Contents to Access August 30, 2004, 4:18 pm
file handling - a simple explanation? May 2, 2005, 4:12 pm
File handling with subroutines and references January 17, 2006, 3:01 pm
file upload - get the file size problem May 19, 2006, 11:56 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap