Click here to get back home

Spreadsheet::Read special characters handling

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Spreadsheet::Read special characters handling anevare2 11-20-2006
Posted by anevare2 on November 20, 2006, 2:33 am
Please log in for more thread options


I'm using the Spreadsheet::Read module (which works quite well
generally). I have some spreadsheets with special characters like an
accented e (=E9). I'm having some trouble processing these characters.
I haven't dealt much with these type of characters in this context in
the past. The accented e's are coming out like "?".

My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
in A1, excel automatically made the accented e, in A2, i pressed Option
e e for the accented e, and A3, I undid the special e to make it a
regular e

A1:A3
---------
Caf=E9
Caf=E9
Cafe

This is my program
------------------------------
use Spreadsheet::Read;

my $ref =3D ReadData('special_char_test.xls');

my $cell1 =3D $ref->[1];
my $cell2 =3D $ref->[1][1][2]; #try different way
my $cell3 =3D $ref->[1][1][3];

print "Cell A1: $cell1\n";
print "Cell A2: $cell2\n";
print "Cell A3: $cell3\n";

Output (standard out):
-------------------
Cell A1: Caf?
Cell A2: Caf?
Cell A3: Cafe

What can I do so that the accented e prints correctly or so the correct
format can be saved to a csv file?
Thanks.
A=2E


Posted by H.Merijn Brand on November 20, 2006, 7:02 am
Please log in for more thread options



> I'm using the Spreadsheet::Read module (which works quite well
> generally). I have some spreadsheets with special characters like an
> accented e (é). I'm having some trouble processing these characters.
> I haven't dealt much with these type of characters in this context in
> the past. The accented e's are coming out like "?".

That is based on both encoding and the font you use on the terminal.
By default, Spreadsheet::Read does not change the encoding, which means
that if the fields are encoded in Unicode (utf8), you should take action
in your script to output Unicode.

Read http://search.cpan.org/~rgarcia/perl-5.9.4/pod/perlunitut.pod

Summary, if your terminal is capable of dealing with UTF8 (like a
recent X11R6 xterm with utf8 enabled and font *-iso10646-1), then
adding

binmode STDOUT, ":utf8";

will probably suffice. If your terminal is iso8859-*, which also
supports the e-acute, then you will have to take appropriate actions

I think that the csv file is OK already. Try opening it in whatever
unicode enabled editor (I think both M$Word and M$Excel will do here)
and see how it looks

> My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
> in A1, excel automatically made the accented e, in A2, i pressed Option
> e e for the accented e, and A3, I undid the special e to make it a
> regular e
>
> A1:A3
> ---------
> Café
> Café
> Cafe
>
> This is my program
> ------------------------------
> use Spreadsheet::Read;
>
> my $ref = ReadData('special_char_test.xls');
>
> my $cell1 = $ref->[1];
> my $cell2 = $ref->[1][1][2]; #try different way
> my $cell3 = $ref->[1][1][3];
>
> print "Cell A1: $cell1\n";
> print "Cell A2: $cell2\n";
> print "Cell A3: $cell3\n";
>
> Output (standard out):
> -------------------
> Cell A1: Caf?
> Cell A2: Caf?
> Cell A3: Cafe
>
> What can I do so that the accented e prints correctly or so the correct
> format can be saved to a csv file?
> Thanks.

Posted by Al on November 20, 2006, 5:24 pm
Please log in for more thread options


thanks guys.. very helpful. thanks also for referring me to that good
Unicode tutorial.

Adding this line to my perl program did the trick:
binmode STDOUT, ":utf8";

I had to do similar with the Filehandle of the file I write to.

Then, with TextWrangler, if I open that resulting file in UTF-8 mode,
it looks perfect, accent marks and all.

I'm using Perl 5.8.6 on Mac OS X

thanks so much!
A


H=2EMerijn Brand wrote:
>
> > I'm using the Spreadsheet::Read module (which works quite well
> > generally). I have some spreadsheets with special characters like an
> > accented e (=E9). I'm having some trouble processing these characters.
> > I haven't dealt much with these type of characters in this context in
> > the past. The accented e's are coming out like "?".
>
> That is based on both encoding and the font you use on the terminal.
> By default, Spreadsheet::Read does not change the encoding, which means
> that if the fields are encoded in Unicode (utf8), you should take action
> in your script to output Unicode.
>
> Read http://search.cpan.org/~rgarcia/perl-5.9.4/pod/perlunitut.pod
>
> Summary, if your terminal is capable of dealing with UTF8 (like a
> recent X11R6 xterm with utf8 enabled and font *-iso10646-1), then
> adding
>
> binmode STDOUT, ":utf8";
>
> will probably suffice. If your terminal is iso8859-*, which also
> supports the e-acute, then you will have to take appropriate actions
>
> I think that the csv file is OK already. Try opening it in whatever
> unicode enabled editor (I think both M$Word and M$Excel will do here)
> and see how it looks
>
> > My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
> > in A1, excel automatically made the accented e, in A2, i pressed Option
> > e e for the accented e, and A3, I undid the special e to make it a
> > regular e
> >
> > A1:A3
> > ---------
> > Caf=E9
> > Caf=E9
> > Cafe
> >
> > This is my program
> > ------------------------------
> > use Spreadsheet::Read;
> >
> > my $ref =3D ReadData('special_char_test.xls');
> >
> > my $cell1 =3D $ref->[1];
> > my $cell2 =3D $ref->[1][1][2]; #try different way
> > my $cell3 =3D $ref->[1][1][3];
> >
> > print "Cell A1: $cell1\n";
> > print "Cell A2: $cell2\n";
> > print "Cell A3: $cell3\n";
> >
> > Output (standard out):
> > -------------------
> > Cell A1: Caf?
> > Cell A2: Caf?
> > Cell A3: Cafe
> >
> > What can I do so that the accented e prints correctly or so the correct
> > format can be saved to a csv file?
> > Thanks.


Posted by harryfmudd [AT] comcast [DOT] on November 21, 2006, 7:11 pm
Please log in for more thread options


Al wrote:
> thanks guys.. very helpful. thanks also for referring me to that good
> Unicode tutorial.
>
> Adding this line to my perl program did the trick:
> binmode STDOUT, ":utf8";
>
> I had to do similar with the Filehandle of the file I write to.
>
> Then, with TextWrangler, if I open that resulting file in UTF-8 mode,
> it looks perfect, accent marks and all.
>
> I'm using Perl 5.8.6 on Mac OS X
>

Maybe you should look into your Terminal window settings. Use menu
Terminal/Window Settings ... and select "Display".

Tom Wyant

Posted by Al on December 5, 2006, 1:36 pm
Please log in for more thread options


Hi,
Any suggestions for handling Asian characters from the original Excel?
Perl's binmode setting helps to support accented characters fine.. but
when you go beyond the 256 bits.. seems that the Spreadsheet::Read Perl
module may have no way of knowing what Excel's encoding is.

I'd like to input an excel that has Asian characters, process with
perl, and then write a csv or xml file (utf-8 encoded) with proper
Asian content.

A

harryfmudd [AT] comcast [DOT] net wrote:
> Al wrote:
> > thanks guys.. very helpful. thanks also for referring me to that good
> > Unicode tutorial.
> >
> > Adding this line to my perl program did the trick:
> > binmode STDOUT, ":utf8";
> >
> > I had to do similar with the Filehandle of the file I write to.
> >
> > Then, with TextWrangler, if I open that resulting file in UTF-8 mode,
> > it looks perfect, accent marks and all.
> >
> > I'm using Perl 5.8.6 on Mac OS X
> >
>
> Maybe you should look into your Terminal window settings. Use menu
> Terminal/Window Settings ... and select "Display".
>
> Tom Wyant


Similar ThreadsPosted
handling UTF-8 characters in LWP module August 31, 2006, 10:39 pm
ANN: Spreadsheet::Read 0.15 June 21, 2006, 12:40 pm
ANN: Spreadsheet::Read 0.16 July 4, 2006, 7:01 am
Re: ANN: Spreadsheet::Read 0.16 July 4, 2006, 9:36 am
[ANN] Spreadsheet::Read 0.20 May 31, 2007, 9:38 am
Problem with Spreadsheet::Read June 8, 2006, 4:56 am
Announce: Spreadsheet::Read-0.23 June 21, 2007, 7:14 am
Spreadsheet::ParseExcel : read cell-notes May 27, 2005, 11:54 am
HTML-Parser: storing into a DB words with special chars September 21, 2005, 2:40 am
searching for module providing menus with special abilities October 25, 2004, 11:13 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap