|
Posted by H.Merijn Brand on November 12, 2007, 9:22 am
Please log in for more thread options
wrote:
> Mumia W. wrote:
>> On 10/24/2007 11:59 PM, Petr Vileta wrote:
>>> Well, I'm pleased to see you here :-)
>>> I tried to use your module Text::CSV_XS for storing some data to CSV
>>> file but without success. The problem is national characters. When I
>>> tried $csv->combine(('abc',áíá','def') I got "abc\n" only. Your
>>> module fail on first field where something greather then \x7f is.
>>> But no error, no warning.
>>> Is this a bug or feature?
>>
>> This sort-of works for me:
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use encoding 'iso-8859-1';
>> use Text::CSV_XS 0.32;
>>
>> print "Version = $Text::CSV_XS::VERSION\n";
>>
>> my $csv = Text::CSV_XS->new({binary => 1});
>
> I suppose that binari is intended for "unprintable" characters.
depends. Do you think \x is unprintable? or \x
>> However, the output seems to be forced to UTF-8:
Text::CSV_XS doesn't know anything about encoding.
> [snip]
> Maybe will be good to add some functions to your module to set up input
> and output codepages. Some like
> $csv = $csv = Text::CSV_XS->new('input_charser' => 'utf-8',
> 'output_charset => 'iso-8859-1');
That would of course be
my $csv = Text::CSV_XS->new ({
input_charset => "utf-8",
output_charset => "iso-8859-1",
});
1: s/charser/charset/
2: put in an anon-hash
The idea sounds nice, but would severely slow down all
scripts that use Text::CSV_XS in a transparent mode,
without Encoding/Decoding.
It is rather easy to do it right from the user point of view.
Here's the snippet used in the test suite to check if encoding
works (t/50_utf8.t):
my $csv = Text::CSV_XS->new ({ binary => 1, always_quote => 1 });
# Special characters to check:
# 0A = \n 2C = , 20 = 22 = "
# 0D = \r 3B = ;
foreach my $test (
# Space-like characters
[ "\x", "U+0000A0 NO-BRAK SPACE" ],
[ "\x", "U+00200B ZERO WIDTH SPACE" ],
# Some characters with possible problems in the code point
[ "\x", "U+000122 LATIN CAPITAL LETTER G WITH CEDILLA" ],
[ "\x", "U+002C22 GLAGOLITIC CAPITAL LETTER SPIDERY HA" ],
[ "\x", "U+000A2C GURMUKHI LETTER BA" ],
[ "\x", "U+000E2C THAI CHARACTER LO CHULA" ],
[ "\x", "U+010A2C KHAROSHTHI LETTER VA" ],
# Characters with possible problems in the encoded representation
# Should not be possible. ASCII is coded in 000..127, all other
# characters in 128..255
) {
my ($u, $msg) = @$test;
utf8::encode ($u);
my @in = ("", " ", $u, "");
my $exp = join ",", map { qq } @in;
ok ($csv->combine (@in), "combine $msg");
my $str = $csv->string;
is_binary ($str, $exp, "string $msg");
ok ($csv->parse ($str), "parse $msg");
my @out = $csv->fields;
# Cannot use is_deeply (), because of the binary content
is (scalar @in, scalar @out, "fields $msg");
for (0 .. $#in) {
is_binary ($in[$_], $out[$_], "field $_ $msg");
}
}
> But this is my idea only ;-)
|
|
Posted by Petr Vileta on November 12, 2007, 10:20 pm
Please log in for more thread options
H.Merijn Brand wrote:
> wrote:
[snip]
>> I suppose that binary is intended for "unprintable" characters.
>
> depends. Do you think \x is unprintable? or \x
>
Ehm, yes ;-) I meant unprintable in \x00 to \xff code range, so all
characters less then \x20 except \x0a, \x0d, \x09.
[snip]
> That would of course be
>
> my $csv = Text::CSV_XS->new ({
> input_charset => "utf-8",
> output_charset => "iso-8859-1",
> });
>
> The idea sounds nice, but would severely slow down all
> scripts that use Text::CSV_XS in a transparent mode,
> without Encoding/Decoding.
>
But you can check if programmer set both charsets in ->new() part of module.
If both charsets are set then run in "translate" mode, if none is set then
run in "transparent" mode and if only one is set then return error.
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
|
|
Posted by H.Merijn on October 26, 2007, 3:13 pm
Please log in for more thread options wrote:
> H.Merijn Brand wrote:
>> The following report has been written by the PAUSE namespace indexer.
>> Please contact modules@perl.org if there are any open questions.
>> Id: mldistwatch 925 2007-09-16 15:41:11Z k
>>
>> User: HMBRAND (H.Merijn Brand)
>> Distribution file: Text-CSV_XS-0.32.tgz
>
> Well, I'm pleased to see you here :-)
I've been here before, but I prefer private mail :)
> I tried to use your module Text::CSV_XS for storing some data to CSV
> file but without success. The problem is national characters. When I
> tried $csv->combine(('abc',áíá','def') I got "abc\n" only.
As both Mumia and the docs make (now) VERY clear, you need the binary
flag. This version has made that even more clear. You *do* read the
docs, right?
--8<---
Important Note: The default behavior is to only accept ascii
characters. This means that fields can not contain newlines. If
your
data contains newlines embedded in fields, or characters above 0x7e
(tilde), or binary data, you *must* set "binary => 1" in the call to
"new ()". To cover the widest range of parsing options, you will
always want to set binary.
-->8---
> Your module fail on first field where something greather then \x7f is.
My module doesn't fail here. It is the default, documented, and correct
behaviour :)
> But no error, no warning.
> Is this a bug or feature?
Feature, or documented behaviour. Whatever you prefer.
In the distribution, check out t/50_utf8.t to see how you should be
dealing with non-ASCII characters. Maybe I can put that example in
the documentation, as I keep refering to that file.
|
| Similar Threads | Posted | | ANNOUNCE: Text::Iconv 1.4 | July 18, 2004, 1:41 am |
| ANNOUNCE: Text-Bidi-0.01 | August 28, 2006, 2:08 pm |
| [ANNOUNCE] Text-CSV_XS 0.25 | May 7, 2007, 11:22 am |
| ANNOUNCE Text::CSV_XS 0.26 | May 15, 2007, 7:28 am |
| ANNOUNCE: Text::CSV_XS 0.26 | May 15, 2007, 7:30 am |
| ANNOUNCE: Text-CSV_XS 0.28 | June 4, 2007, 6:56 am |
| ANNOUNCE: Text-CSV_XS 0.28 | June 4, 2007, 6:54 am |
| ANNOUNCE: Text::CSV_XS 0.29 | June 8, 2007, 6:29 am |
| ANNOUNCE: Text-CSV_XS 0.29 | June 8, 2007, 6:36 am |
| Announce: Text::CSV_XS 0.30 | June 21, 2007, 7:24 am |
|