Click here to get back home

Spreadsheet::Read special characters handling

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Spreadsheet::Read special characters handling anevare2 11-20-2006
Posted by harryfmudd [AT] comcast [DOT] on December 6, 2006, 6:44 pm
Please log in for more thread options


Al wrote:
> Hi,
> Any suggestions for handling Asian characters from the original Excel?
> Perl's binmode setting helps to support accented characters fine.. but
> when you go beyond the 256 bits.. seems that the Spreadsheet::Read Perl
> module may have no way of knowing what Excel's encoding is.
>
> I'd like to input an excel that has Asian characters, process with
> perl, and then write a csv or xml file (utf-8 encoded) with proper
> Asian content.
>
> A

I'm not an expert on non-ASCII character sets, so the following is
somewhat provisional. But the thread has been fallow for about a day and
a half, and I figure if I say something horribly wrong someone will jump
at the opportunity to correct me.

Anyhow, this is what I _think_ the situation is.

I've never used Spreadsheet::Read, but the docs look like it's an
umbrella module, and under the hood it selects the correct module to
read the spreadsheet you gave it. The docs also seem to say that for
Excel it's Spreadsheet::ParseExcel.

Spreadsheet::ParseExcel apparantly will take a filehandle instead of a
spreadsheet name, giving you the opportunity to set the encoding you
want when you open the input file or when you binmode() it. See the docs
for Encode::PerlIO.

I could have sworn I saw documentation somewhere in the Encode-related
modules for a subroutine that would try to guess the encoding of a chunk
of text, but at the moment I can't find it.

Tom Wyant

Posted by harryfmudd [AT] comcast [DOT] on December 9, 2006, 2:43 pm
Please log in for more thread options


harryfmudd [AT] comcast [DOT] net wrote:

> Al wrote:
>
>> Hi,
>> Any suggestions for handling Asian characters from the original Excel?
>> Perl's binmode setting helps to support accented characters fine.. but
>> when you go beyond the 256 bits.. seems that the Spreadsheet::Read Perl
>> module may have no way of knowing what Excel's encoding is.
>>
>> I'd like to input an excel that has Asian characters, process with
>> perl, and then write a csv or xml file (utf-8 encoded) with proper
>> Asian content.
>>
>> A
>
>
> I'm not an expert on non-ASCII character sets, so the following is
> somewhat provisional. But the thread has been fallow for about a day and
> a half, and I figure if I say something horribly wrong someone will jump
> at the opportunity to correct me.
>
> Anyhow, this is what I _think_ the situation is.
>
> I've never used Spreadsheet::Read, but the docs look like it's an
> umbrella module, and under the hood it selects the correct module to
> read the spreadsheet you gave it. The docs also seem to say that for
> Excel it's Spreadsheet::ParseExcel.
>
> Spreadsheet::ParseExcel apparantly will take a filehandle instead of a
> spreadsheet name, giving you the opportunity to set the encoding you
> want when you open the input file or when you binmode() it. See the docs
> for Encode::PerlIO.
>
> I could have sworn I saw documentation somewhere in the Encode-related
> modules for a subroutine that would try to guess the encoding of a chunk
> of text, but at the moment I can't find it.
>
> Tom Wyant

It's Encode::Guess. Duh.

Tom Wyant

Posted by Mumia W. (reading news) on November 20, 2006, 10:04 am
Please log in for more thread options


On 11/20/2006 01:33 AM, anevare2@yahoo.com wrote:
> I'm using the Spreadsheet::Read module (which works quite well
> generally). I have some spreadsheets with special characters like an
> accented e (é). I'm having some trouble processing these characters.
> I haven't dealt much with these type of characters in this context in
> the past. The accented e's are coming out like "?".
>
> My spreadsheet has cells A1, A2 and A3 set to Cafe like the following:
> in A1, excel automatically made the accented e, in A2, i pressed Option
> e e for the accented e, and A3, I undid the special e to make it a
> regular e
>
> A1:A3
> ---------
> Café
> Café
> Cafe
>
> This is my program
> ------------------------------
> use Spreadsheet::Read;
>
> my $ref = ReadData('special_char_test.xls');
>
> my $cell1 = $ref->[1];
> my $cell2 = $ref->[1][1][2]; #try different way
> my $cell3 = $ref->[1][1][3];
>
> print "Cell A1: $cell1\n";
> print "Cell A2: $cell2\n";
> print "Cell A3: $cell3\n";
>
> Output (standard out):
> -------------------
> Cell A1: Caf?
> Cell A2: Caf?
> Cell A3: Cafe
>
> What can I do so that the accented e prints correctly or so the correct
> format can be saved to a csv file?
> Thanks.
> A.
>

I have no problems outputting accented characters from
Spreadsheet::Read. Either your Perl or your terminal is not able to deal
with the accented characters.

Try placing "use encoding 'iso-8859-1;' at the top of your program.

Recent versions of Perl (>= 5.8) should be able to handle character
encodings well, but you might have to set up your locale properly, and
you might have to configure your terminal to display those characters.


--
paduille.4060.mumia.w@earthlink.net

Similar ThreadsPosted
handling UTF-8 characters in LWP module August 31, 2006, 10:39 pm
ANN: Spreadsheet::Read 0.15 June 21, 2006, 12:40 pm
ANN: Spreadsheet::Read 0.16 July 4, 2006, 7:01 am
Re: ANN: Spreadsheet::Read 0.16 July 4, 2006, 9:36 am
[ANN] Spreadsheet::Read 0.20 May 31, 2007, 9:38 am
Problem with Spreadsheet::Read June 8, 2006, 4:56 am
Announce: Spreadsheet::Read-0.23 June 21, 2007, 7:14 am
Spreadsheet::ParseExcel : read cell-notes May 27, 2005, 11:54 am
HTML-Parser: storing into a DB words with special chars September 21, 2005, 2:40 am
searching for module providing menus with special abilities October 25, 2004, 11:13 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap