UTF-8 fun

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Below is a PL/Perl routine supposed to convert a DER-encoded certificate
stored in some Postgres (8.4) database column to the corresponding text

create or replace function
x509_der_2_pem(bytea) returns varchar as $z$
    use Crypt::OpenSSL::X509 qw(FORMAT_ASN1 FORMAT_PEM);
    my $cert = $_[0];
    $cert =~ s/(?:\([0-7]))|\/$1 ? chr(oct($1)) : '\'/ge;

    my $x509;
        $x509 = Crypt::OpenSSL::X509->new_from_string($cert, FORMAT_ASN1);
    return $x509->as_string(FORMAT_PEM);
language plperlu;

Running this as-is in a database whose encoding has been set to UTF-8
aborts with

error from Perl function "x509_der_2_pem": Crypt::OpenSSL::X509: failed to read X509 certificate. at line ...

To make this more interesting, writing the certificate data to a file
after decoding the Postgres-internal text representation to binary
yields a correctly formatted file which can be processed by both the
openssl program and the Perl module used above.

Capturing the encoded binary data (before the s///) and decoding that
using the exact same code from the Perl debugger also works.

As it turned out to be, the PL/Perl runtime marks the input string as
UTF-8 despite the type is 'binary data'. The new_from_string ctor then
uses SvPV to get content and length of the input string. If that has
been (wrongly) marked as UTF-8, the length will be wrong, hence, parsing
the certificate data fails with the error shown above.

Adding a

use bytes;

to the function causes everything to work as intended.

Site Timeline