# common roots...string roots

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

Hi, here's my problem of eternal Perl beginner today.

I have a list contained unique sorted strings (avg size 130 chars).

Strings have the following structure

root1."blablabla"
root1."xyzxyzxyxjjj"
root1."fdfkdsl"
:
root2."blfblrble"
root2."fafafdfkdsl"
:
root3."dhsjfsdjfdsj"
root3."rerueuer"
:

my problem? I don't know what root1, root2 and... rootN are, which is
exactly what I am trying to determine, with a little catch: I don't want
those roots to be too short, so I would also like to provide a condition
(a regexp to be evaluated) that instructs the algorithm about a given
threshold, beyond which I no longer care (I don't want ridiculously
short roots).

Any ideas about how this can be achieved?

Thanks

## Re: common roots...string roots

[...]

Doing the length check via regex is a bit strange but entirely possible:

------------
my (%roots, \$min);

\$min = \$ARGV[0] // 0;
print(map {\$_, "\n"}
map {s/\..*//s; /./ && !exists(\$roots) ? \$roots = \$_ : ()} <STDIN>);
------------

NB: This is an example where a variable actually has to be initialized
in Perl.

## Re: common roots...string roots

[...]

Somewhat nicer, doing away with the 'length check via regex' mannerism
and using grep as appropriate here:

-----------
my (%roots, \$min);

\$min = \$ARGV[0] - 1;
print(map { \$_, "\n" }
grep { s/\..*//s; length() > \$min && !\$roots++ } <STDIN>);
-----------

## Re: common roots...string roots

My problem: the problem is not clear to me.

If you got 3 strings
rootfooxyz
rootfooabc
rootbardef
then what are your "root" strings?
rootfoo and rootbar? Or just root? Or rootfoo and root? Or what?

jue

## Re: common roots...string roots

On 10/17/2015 11:24 AM, Jürgen Exner wrote:

rootfoo and rootbar, assuming that a certain boolean conditions turns
true for both...

Thanks

## Re: common roots...string roots

The plot thickens...  Can you give a complete example?  (1) some actual
literal input; (2) any conditions that might need to be true for roots;
(3) the output you'd like to see.

--
Ben.

## Re: common roots...string roots

Ok, then what is the logic behind returning rootbar instead of e.g.
rootb or rootba or rootbard?

jue

## Re: common roots...string roots

On Saturday, October 17, 2015 at 6:07:07 AM UTC-7, Fillmore wrote:

Assuming "." separator between root and the rest of string

use constant MIN => 5;   # eg, min. root length
my @roots;

while (<DATA>) {
/^([^.]+) (?{length \$1 >= MIN and push(@roots,\$1)}) /x;
}
__END__
root1."blablabla"
root1."xyzxyzxyxjjj"
...

## Re: common roots...string roots

On Sunday, October 18, 2015 at 12:23:16 AM UTC-7, C.DeRykus wrote:

Alternatively, with a hash to eliminate dup's and get counts:

my %roots;
while (<DATA>) {
/^([^.]+) (?{length \$1 >= MIN and \$roots++ }) /x;
}

## Re: common roots...string roots

This will either require sorting the %roots keys or having them
reordered in some unpredictable way. Considering that the input file was
supposed to be sorted, this can be avoided by just using the hash to
eliminate duplicates instead of also using it to store the sought-after
data. This was the main idea behind the

map {s/\..*//s; /./ && !exists(\$roots) ? \$roots = \$_ : ()} <STDIN>

expression which will return a list of unique 'roots' in the order they
appear in in the input file.

## Re: common roots...string roots

On Sunday, October 18, 2015 at 9:30:50 AM UTC-7, Rainer Weikusat wrote:

That may be "scope creep" but if important, is fixable:

tie my %roots, 'Tie::IxHash';
while (<DATA>) {
/^([^.]+)  (?{length \$1 >= MIN and \$roots++ }) /x;
}

## Re: common roots...string roots

Considering that this runs another 630 lines of text through the Perl
compiler, just sorting the keys again after gratuitiously throwing the
property of the input away is likely the lesser evil.

## Re: common roots...string roots

On Sunday, October 18, 2015 at 12:17:27 PM UTC-7, Rainer Weikusat wrote:

Compilers can handle that and not even breathe hard. A human
brain on the other hand should not have to "das Rad noch einmal
erfinden".

## Re: common roots...string roots

We're discussing algorithms here and not 'filosofical' ((c) Garfield)
arguments in favour of not thinking about something ...

## Re: common roots...string roots

Took the weekend off. Thank you to everyone who replied. I'll review and
get back with more specific question.

The "map" function always puzzled me, but this might be my chance to
finally get deeper into it. Is it "map" like in the famous map/reduce

## Re: common roots...string roots

Only insofar there's sort-of a common root. It's more a simpler variant
of a Lisp function named mapcar. map evaulates a block or expression for
each element of an input list and returns a list of whatever the block
returned, eg, a simple, contrived example:

perl -e 'print(map { \$_, "\n" } map { \$_, \$_ } 0 .. 20);'

This prints each number of 0 .. 20 twice, putting each onto a lines of
its own.

map { \$_, \$_ } 0 .. 20

returns a list of the form 0, 0, 1, 1, 2, 2, ... ,20, 20.

map { \$_, "\n" } ...

returns all elements of the input list but with newlines
interspersed. The advantage over join("\n", ...) is that the last
element isn't special, ie, it, too, will be followed by a \n.

## Re: common roots...string roots

On 19.10.15 14:12, Fillmore wrote:

They are related to each other like a single power drill
is related to a colliery or to a computer case factory.

Rainer explains map. You may want to have a look at the
docs of List::Util, which has a "generic" reduce.

Here is an approximation to what is still far away from
(patented) MapReduce. It does use "map", a shared data store,
a reduce function, and multiple threads that share the work.

MapReduce is outlined at https://de.wikipedia.org/wiki/MapReduce

# Computes the 12-fold sum of the first 10 positive numbers
# by mapping the work of multiplication to 10 threads; then
# by waiting for the threads to join the main thread; then by
# reducing the interim results that were put in store, to form
# their sum. Summing requires a binary operation and a zero value.

local our %store;
share(%store);

sub not_map_reduce {
my (\$arrayref, \$mapf, \$binop, \$zero) = @_;
my (\$reduce, @workbench);

\$reduce = sub {
my \$slot = each %store;
if (\$slot) {
return \$binop->(\$store, &\$reduce);
}
return \$zero;
};

@workbench = map {
} @{ \$arrayref };

\$_->join() foreach @workbench;

return &\$reduce;
}

print not_map_reduce(
[1 .. 10],
sub { \$store = \$_[0] * 12 },
sub { \$_[0] + \$_[1] },
0
);

## Re: common roots...string roots

On 10/17/2015 03:39 PM, Ben Bacarisse wrote:

> The plot thickens...  Can you give a complete example?
> (1) some actual literal input;
> (2) any conditions that might need to be true for roots;
> (3) the output you'd like to see.

Here is (1): http://pastebin.com/fFycxTGy
(raw: http://pastebin.com/raw.php?i=fFycxTGy )

Expected result (roots) (3)

Anvato ANVSDK/3.2.46 (Linux; Android 4.4.2; SAMSUNG-SM-T537A
DroidParts.org (Android 4.4.2; SAMSUNG-SM-T537A
Mozilla/5.0 (Android 4.4.2; xx-xx; SAMSUNG-SM-T537A
Mozilla/5.0 (Linux; Android 4.4.2; xx-xx; SAMSUNG-SM-T537A
Mozilla/5.0 (Linux; Android 4.4.2; SAMSUNG-SM-T537A

the condition (2) is that either the string "Build" or a closed round parenthesis ")" is found, whichever comes first.

Thank you everyone, guys

## Re: common roots...string roots

I actually managed to achieve my goal (albeit it took significantly more
lines of code :)

Quick question, how do I write the condition more concisely than this?

if (\$elem =~ /(.*?) Build/ || \$elem =~ /(.*?)\)/) {
\$localRoots++;
}

[|] wasn't doing the trick for me...

Thanks everyone for their help and support!

On 10/22/2015 04:37 AM, Fillmore wrote:

## Re: common roots...string roots

On 10/21/2015 21:23, Fillmore wrote:

Using |:
if (\$elem =~ /([^ ]+) Build|([^ ]+)\)/) {
\$localRoots++ if \$1; \$localRoots++ if \$2;
}
or maybe:
if (\$elem =~ /([^ ]+) Build|([^ ]+)\)/) {
my \$x = \$1 // \$2; \$localRoots++;
}