A Perl parsing question..

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Hi experts,

Below is my scenario:

I have the below C:\groups.txt file, with a city name and an associated
list of people belonging to the city. A portion of the file is as
1. Pleasanton

2. Livermore

3. CA

4. Chicago

5. IL

I am working on a script that throws the following output:
1. Parent of Pleasanton is CA.
2. Parent of Livermore is CA.
3. Parent of Chicago is IL.

I am currently using hashes, grep command etc in the scirpt, but I have
no success. Can some one kindly help me with the right algorithm here?


Re: A Perl parsing question..

clearguy02@yahoo.com wrote:
Quoted text here. Click to load it

I'm assuming that, contrary to your description, "CA" is not a city.

Quoted text here. Click to load it
Contrary to your subject, this doesn't seem to be a parsing question.
It is perhaps an inference question, or maybe an implementation of an
inference algorithm.

What criteria, exactly, do you wish to be used to determine that the above
output is the appropriate output?  100% of the people in Pleasanton are
also in CA?  50% of them?  50% are in CA and none are anywhere else?
Quoted text here. Click to load it

Without knowing what the algorithm is to do, it is hard to help you.

To get the number of elements that overlap between two hashes, you can do:
my $overlap=grep exists $h1, keys %h2;

Obviously if $overlap == keys %h2, then %h2 is competely contained in %h1.
Does that help?


-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB

Re: A Perl parsing question..

Thanks Xho..

We know that  that all 100% of Pleasanton folks are also in CA. We
don't need to check it in the code. I just want to store the
cities/States into hash or array and then their vlaues into another
array (or values of the the hash).

Yea.. CA is not a city.. I just quoted it as an example.

I am confused as how to break the whole input file into two hashes.


Re: A Perl parsing question..

clearguy02@yahoo.com kirjoitti:
Quoted text here. Click to load it

This guy keeps sending these questions all over, most apparently to
collect names or email addresses for some purpose. The sender's name and
email address may vary, but the pattern and style of the message is the
same. It's always about using perl to re-arrange a text file.

Re: A Perl parsing question..

I am not collecting any names or email addresses here.

I am a manager and not a full time programmer. Once in a while I need
to parse the text files I get...

Pl. don't come to some conclusions right away with out thinking what is
the truth.  I never posted any junk mails.. I don't work on perl
scirpts on a full time basis and I struggle sometimes to come up with
the right script.


Re: A Perl parsing question..

clearguy02@yahoo.com kirjoitti:
Quoted text here. Click to load it

Right away? I didn't, at the first time. But I have seen the same
question popping up too many times, in several different forums, over a
long timespan. There's no way you could be struggling with the same
simple problem over so much time. No way.

Re: A Perl parsing question..

Can you go ahead and show me the same above question that I had posted
earlier in hteo ther groups?

Did you read my mail completely yet? Yea, I usually post all
text-parsing questions, because it would help in my reports.

The question I posted today is a new issue and I never posted it any
where else.

Instead of wasting time in arguing, why don't you read my mail and
suggest me a solution?


Re: A Perl parsing question..

clearguy02@yahoo.com kirjoitti:
Quoted text here. Click to load it

Did you really think I cannot give any examples? Poor you - the account
name "clearguy02" happens to be not so well chosen as to get buried in
the noise.

On 10th Jan 2003, you posted a message
- to the ClearCase International User Group mailing list
- with display name: "John Smith"
- and message subject: "A clearCase interview question.."

On 17th Jan 2003, you posted another message
- to the ClearCase International User Group mailing list
- with display name: "Bob Smith"
- and message subject: "Parsing a text file with perl.."

In the either case, the "problem" you posted was very close or almost
identical to the one here - and had nothing to do with ClearCase.

You posted the latter one also to comp.lang.perl on 18th Jan 2003, now
with subject "Extracting a portion of a text file.... " but exactly the
same problem. And again on comp.lang.perl.misc, on 22nd Jan 2003. What
was wrong with the answers you received for the previous postings?

On 17th May 2004, you posted
- to this forum (comp.lang.perl.misc)
- with display name "John Smith"
- and the subject "Parsing a text file..... "

On 21th Feb, you posted another question with the exactly same
subject line to the same forum so that it actually shows up in the same
thread as the original question. This time, the content was different,
but the idea closely related anyway.

Oh, and there are plenty of other examples, and other forums too.
Just Google for them.

All of the threads you have initiated are similar enough to raise
suspicions. For one, you surely do not seem to be interested in learning
anything from the answers you get; they do keep repeating certain basic
things that you keep ignoring year after year.

I cannot believe you post these questions for the reason what you claim.
Smells more like fishing to me.


Re: A Perl parsing question..

* clearguy02@yahoo.com schrieb:
Quoted text here. Click to load it

Read in this file first and save all data in a hash. The cities are the
hash keys referring to arrays containing the inhabitants. We could read
the file in paragraph mode to avoid regular expressions in this case. A
proper split() will remove all disturbing blanks implicitly. Nice ;-)

    #!/usr/bin/perl -w
    use strict;

    my %hash;
    open my $fh, 'c:/groups.txt' or die $!;
        local $/ = ""; # paragraph mode
        while ( <$fh> ) {
            my( undef, $city, @names ) = split;
            $hash{ $city } = \@names;
    close $fh or die $!;

Now, we need to compare the inner arrays. I suppose to write a sub which
tests if one array contains another one. The following sub expects two
arrayrefs and returns true if the first is a subset of the second one.
It looks a little bit harder than necessary but as a reward it handles
multisets too (if there are names more than once in one city).

    sub isSubset {
        my( $u, $v ) = @_;
        my %v; $v++ for @$v;
        --$v >= 0 or return for @$u;
        return 1;
With this we could compare each array to each other (since we could not
gain any information about subset-ness from your file we have to go this
brute-force way).

    for my $x ( keys %hash ) {
        for my $y ( keys %hash ) {
            print "$x is subset of $y\n"
                if $x ne $y and isSubset(@hash)

Quoted text here. Click to load it

If I run this script with your sample data in the text file I get the
following output.

    Chicago is subset of IL
    Pleasanton is subset of CA
    Livermore is subset of CA

I think this is the information you want.


Site Timeline