Click here to get back home

Regular expression for matching words containing underscore _ character

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Regular expression for matching words containing underscore _ character Raj 12-12-2007
Get Chitika Premium
Posted by Raj on December 12, 2007, 10:27 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by RedGrittyBrick on December 12, 2007, 10:47 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Tad J McClellan on December 12, 2007, 10:19 pm
Please log in for more thread options
> Raj wrote:
>> I have large text passages containing names of database tables,
>> procedures, packages, variables etc having the underscore character as
>> a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
>> Z0-9_]+\b" but it matches all words in the passage.
>
> Similarly "[ab]+" matches "aaa" and "aa" though neither contain "b".
>
> Try "\b[a-zA-Z0-9]+_[a-zA-Z0-9_]+\b"
>
> Or "\b\w+_\w+\b"


Three (six?) useless uses of word boundary in the quotes above...

Every pattern there will behave identically without any \b's.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Posted by RedGrittyBrick on December 13, 2007, 5:11 am
Please log in for more thread options
Tad J McClellan wrote:
>> Raj wrote:
>>> I have large text passages containing names of database tables,
>>> procedures, packages, variables etc having the underscore character as
>>> a part of the name. eg. rsp_names_friends_master. I tried "\b[a-zA-
>>> Z0-9_]+\b" but it matches all words in the passage.
>> Similarly "[ab]+" matches "aaa" and "aa" though neither contain "b".
>>
>> Try "\b[a-zA-Z0-9]+_[a-zA-Z0-9_]+\b"
>>
>> Or "\b\w+_\w+\b"
>
>
> Three (six?) useless uses of word boundary in the quotes above...
>
> Every pattern there will behave identically without any \b's.
>
>

TFTC

$ perl -e 'print "$_\n" for "_aa-bbb.cc_[d_d]" =~ /\w+/g'
_aa
bbb
cc_
d_d

$ perl -e 'print "$_\n" for "_aa-bbb.cc_[d_d]" =~ /\w+_\w+/g'
d_d

In Perl programs I've written, I don't think I've ever used \b. Perhaps
I should have analyzed the OP's RE completely rather than only
commenting on the primary reason for the problem.

Posted by Raj on December 12, 2007, 10:54 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Similar ThreadsPosted
Matching single character words April 17, 2006, 10:30 pm
regular expression for english words May 12, 2005, 11:50 am
Regular expression to match only strings NOT containing particular words October 19, 2007, 1:00 am
Re: Regular expression to match only strings NOT containing particular words October 19, 2007, 12:40 pm
Question about "?" character in Perl Regular Expression January 2, 2008, 2:58 am
regular expression negate a word (not character) January 25, 2008, 8:16 pm
regular expression, matching sub item January 30, 2006, 6:04 pm
matching a complicated url in a regular expression January 13, 2007, 12:32 pm
matching chunks of data with a regular expression August 26, 2004, 7:02 am
Regular Expression check for non matching string September 22, 2005, 8:27 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap