Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Jochen Lehmeier
July 23, 2009, 6:02 pm
rate this thread
I am using DBD::Oracle with a client and database characterset of AL32UTF8.
My problem is that DBD::Oracle assumes utf8-encoded strings in prepare().
If the input strings are not utf8, and contain non-ASCII characters, then
they are *not* automatically converted to utf8 by DBD::Oracle. They are
offered to Oracle as if they were utf8, and Oracle processes them even if
they are not valid, maybe even returning them to Perl as such; Perl is
then left with invalid utf8-flagged strings simply actually latin1
To be very clear: this is *not* an issue of wrong character encodings
during any time before the calls to DBI/DBD::Oracle (though it is,
I seem to remember, although I cannot find it now, that this behaviour
also was well documented in DBD::Oracle or at some other place, so I am
not so much complaining about DBD::Oracle (though I'm a bit miffed - this
would seem to be trivially fixable in there...).
Here is an example:
my $sql="select 'a \xe4 a' from dual"; # \xe4 is the german "a
diag(DBI::data_string_desc($sql)); # UTF8 off, non-ASCII, 24
characters 24 bytes
diag("is_utf8: ".utf8::is_utf8($rc)." valid:
'".utf8::valid($rc)."'"); # is_utf8: 1 valid: ''
diag(DBI::data_string_desc($rc)); # UTF8 on but INVALID encoding,
ASCII, 3 characters 5 bytes
# UTF8 off, non-ASCII, 24 characters 24 bytes
# is_utf8: 1 valid: ''
Malformed UTF-8 character (unexpected non-continuation byte 0x20,
immediately after start byte 0xe4) in pattern match (m//) at
/usr/lib/perl5/site_perl/5.8.3/x86_64-linux-thread-multi/DBI.pm line 1117.
# UTF8 on but INVALID encoding, ASCII, 3 characters 5 bytes
This also highlights one nastiness of the situation: no error or warning
whatsoever about the situation is displayed by the usual
prepare/execute/fetch calls. The "Malformed ..."-line above is from inside
DBI::data_string_desc, where the broken string is used in a regexp
(regexp's seem to be one place where this is noticed). So it's purely by
chance that there is a notice about this problem in this test script.
What to do? I have dozens, maybe 100 scripts that would be perfectly
utf8-capable except for this issue. They use proper :encoding() on their
file handles etc. . Is there some flag I can activate in DBD::Oracle or
DBI which automatically recodes "UTF8 off, non-ASCII" strings to utf8? A
simple Encode::decode("utf8",$sql) fixes the issue of course, but I am not
really looking forward to changing all those scripts, maybe missing lots
of stuff along the way or introducing errors.
We have a module that handles low level DB stuff like opening the DB
handle. So if there was a fix which could take place *once* after opening
the DB, that would be great.
$DBD::Oracle::VERSION = '1.23';
$DBI::VERSION = "1.52";
Thanks in advance!
- » FAQ 6.18 Why don't word-boundary searches with "\b" work for me?
- — Next thread in » PERL Discussions