Click here to get back home

regex dingbat dodge - single char as string to repeatable single char.

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
regex dingbat dodge - single char as string to repeatable single char. John 01-25-2008
Posted by John on January 25, 2008, 2:16 pm
Please log in for more thread options
I have text that might have had a star character in the proprietary
orginating system. The character is used in ratings boxes: A three-
star movie, a four-star restaurant, etc.

By the time it's exported and available to me, it's represented by a
string: "<star>".

I want to suround consecutive stars with font coding and replace each
instance of the string with a single character that, in conjuction
with the font change, will eventually print as a star.

To set up this substitution, I change the strings back to a unique
character, one that I reckon would never occur in nature.

When I try to surround any repetitions of this invented character, I
instead match everything.

===

#!/usr/bin/perl -w
use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig;                 # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g;                # change pseudocharacter to single
character
$text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
pseudocode

print $text;

====

If I limit the search to five consecutive stars, the match works as I
intended:

===

#!/usr/bin/perl -w
use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig;                 # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g;                # change pseudocharacter to single
character
$text =~ s/(\xbc)/_STARFONT_$1_ENDSTAR/g; #bracket groups of
stars in more pseudocode

print $text;

===

So what am I missing when it comes to the first search?

Certainly, I am missing some superior technique for matching repeated
instances of such a string, so I am open to suggestions there.

John Campbell
Haddonfield, NJ 08033

Posted by John W. Krahn on January 25, 2008, 2:39 pm
Please log in for more thread options
John wrote:
> I have text that might have had a star character in the proprietary
> orginating system. The character is used in ratings boxes: A three-
> star movie, a four-star restaurant, etc.
>
> By the time it's exported and available to me, it's represented by a
> string: "<star>".
>
> I want to suround consecutive stars with font coding and replace each
> instance of the string with a single character that, in conjuction
> with the font change, will eventually print as a star.
>
> To set up this substitution, I change the strings back to a unique
> character, one that I reckon would never occur in nature.
>
> When I try to surround any repetitions of this invented character, I
> instead match everything.
>
> ===
>
> #!/usr/bin/perl -w
> use strict;
>
> my $text = "Cuisine: Urban deli<ep>";
> $text .= "Overall: <star><star><star><star><1/2> (very good to
> excellent)<ep>";
> $text .= "Food: <star><star><star><star><1/2><ep>";
>
> $text =~ s/\<star\>/_STAR_/ig;                 # uscores easier in regex than angle
> brackets.
> $text =~ s/_STAR_/\xbc/g;                # change pseudocharacter to single
> character
> $text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
> pseudocode
>
> print $text;
>
> ====
>
> If I limit the search to five consecutive stars, the match works as I
> intended:
>
> ===
>
> #!/usr/bin/perl -w
> use strict;
>
> my $text = "Cuisine: Urban deli<ep>";
> $text .= "Overall: <star><star><star><star><1/2> (very good to
> excellent)<ep>";
> $text .= "Food: <star><star><star><star><1/2><ep>";
>
> $text =~ s/\<star\>/_STAR_/ig;                 # uscores easier in regex than angle
> brackets.
> $text =~ s/_STAR_/\xbc/g;                # change pseudocharacter to single
> character
> $text =~ s/(\xbc)/_STARFONT_$1_ENDSTAR/g; #bracket groups of
> stars in more pseudocode
>
> print $text;
>
> ===
>
> So what am I missing when it comes to the first search?
>
> Certainly, I am missing some superior technique for matching repeated
> instances of such a string, so I am open to suggestions there.

In the first regular expression you are matching '\xbc*' and in the
second you are matching '\xbc'. The '*' modifier matches *zero* or
more times and there are *zero* '\xbc' characters everywhere in the
string. The second one has to match at least *one* character. Change
'\xbc*' to '\xbc+'.


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Posted by John on January 25, 2008, 3:08 pm
Please log in for more thread options
> John wrote:
>=A0The '*' modifier matches *zero* or
> more times and there are *zero* '\xbc' characters everywhere in the
> string. =A0The second one has to match at least *one* character. =A0Change=

> '\xbc*' to '\xbc+'.

That does the trick. It ought to come in handy.

Just realized that this snippet prints what in some systems is an
unprintable character.
I just see questions marks. Hope I didn't cause any problems with that.

Posted by Ben Morrow on January 25, 2008, 5:46 pm
Please log in for more thread options

> I have text that might have had a star character in the proprietary
> orginating system. The character is used in ratings boxes: A three-
> star movie, a four-star restaurant, etc.
>
> By the time it's exported and available to me, it's represented by a
> string: "<star>".
>
> I want to suround consecutive stars with font coding and replace each
> instance of the string with a single character that, in conjuction
> with the font change, will eventually print as a star.
>
> To set up this substitution, I change the strings back to a unique
> character, one that I reckon would never occur in nature.
>
> When I try to surround any repetitions of this invented character, I
> instead match everything.
>
> #!/usr/bin/perl -w

You want

use warnings;

rather than -w, nowadays.

> use strict;
>
> my $text = "Cuisine: Urban deli<ep>";
> $text .= "Overall: <star><star><star><star><1/2> (very good to
> excellent)<ep>";
> $text .= "Food: <star><star><star><star><1/2><ep>";
>
> $text =~ s/\<star\>/_STAR_/ig;                 # uscores easier in regex
> than angle

No they're not. Angles don't need escaping inside regexen.

> brackets.
> $text =~ s/_STAR_/\xbc/g;                # change pseudocharacter to single
> character
> $text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
> pseudocode

I don't know what the point of that is, unless you have some intervening
code that processes char-by-char.

s/( (?: <star> )+ )/_STARFONT_$1_ENDSTAR/gx;

will work perfectly well. Notice the difference between () and (?: )
(capturing vs. grouping) and my use of /x to make the regex more
comprehensible. Needing + instead of * has already been covered :).

Ben


Similar ThreadsPosted
Regex to match say char 't' exactly once in a string and no more than once February 3, 2005, 1:53 am
Non-printable char in regex July 9, 2008, 2:29 pm
string substiution char 192-255 December 14, 2006, 9:27 am
How do you retrieve a char from a string? July 17, 2007, 3:50 pm
Backticks with single quote inside single quotes April 15, 2006, 5:56 pm
char->integer, integer->char commands November 21, 2005, 10:24 am
tr/ last char x$ March 15, 2007, 7:57 am
substitute char January 14, 2006, 8:58 pm
print chinese char September 9, 2005, 9:05 am
insert pipe char January 12, 2006, 10:24 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap