FAQ 6.6 How can I make "\w" match national character sets?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

This message is one of several periodic postings to comp.lang.perl.misc
intended to make it easier for perl programmers to find answers to
common questions. The core of this message represents an excerpt
from the documentation provided with Perl.


6.6: How can I make "\w" match national character sets?

    Put "use locale;" in your script. The \w character class is taken from
    the current locale.

    See perllocale for details.


Documents such as this have been called "Answers to Frequently
Asked Questions" or FAQ for short.  They represent an important
part of the Usenet tradition.  They serve to reduce the volume of
redundant traffic on a news group by providing quality answers to
questions that keep coming up.

If you are some how irritated by seeing these postings you are free
to ignore them or add the sender to your killfile.  If you find
errors or other problems with these postings please send corrections
or comments to the posting email address or to the maintainers as
directed in the perlfaq manual page.

Note that the FAQ text posted by this server may have been modified
from that distributed in the stable Perl release.  It may have been
edited to reflect the additions, changes and corrections provided
by respondents, reviewers, and critics to previous postings of
these FAQ. Complete text of these FAQ are available on request.

The perlfaq manual page contains the following copyright notice.


    Copyright (c) 1997-2002 Tom Christiansen and Nathan
    Torkington, and other contributors as noted. All rights

This posting is provided in the hope that it will be useful but
does not represent a commitment or contract of any kind on the part
of the contributers, authors or their agents.

Re: FAQ 6.6 How can I make "\w" match national character sets?

On Wed, 30 Mar 2005, PerlFAQ Server wrote:

> 6.6: How can I make "\w" match national character sets?
>     Put "use locale;" in your script. The \w character class is taken from
>     the current locale.
>     See perllocale for details.

Hmmm...  I suspect this could use some re-working?

The problem is that Perl's unicode support is said to be only
partially compatible with its locale support.

If I look at (e.g) ActivePerl 5.8.6, then it's evidently got somewhat
better than in earlier 5.8.* versions, but perllocale does say this:

| Usually locale settings and Unicode do not affect each other,
| but there are exceptions, see Locales in the perlunicode
| manpage for examples.

and perlunicode still says this:

| Use of locales with Unicode is discouraged.

Taking that in conjunction with the earlier, and stronger, warnings
about using locales in conjunction with unicode, I suspect that those
who need unicode support and are writing code that's meant to be
compatible with some non-latest Perl versions, are in need of guidance
on this matter?  Just exactly what that guidance should be, I'm afraid
I don't have enough expertise to go into detail on, but I do have a
strong feeling that something is needed (sorry!).

At any rate, see perlunicode, subheading "Security implications of
Unicode", second bullet item.

all the best

Site Timeline