Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Subject
- Posted on
- perl graph theory (using Graph::Directed)
- 07-26-2004
posted on
July 26, 2004, 5:14 am
July 26, 2004, 5:14 am
I'm fairly new to perl and graph theory, so hopefully this question isn't
too ignorant on both counts. I'm trying to input basically an edge list for
a graph, and get an output of the connected graphs that all of these edges
comprise. So for example:
Edge list input:
A B
A C
B A
D E
Perl script output:
A+B+C,D+E
using strongly_connected_graph (described in Graph::Base). Admittedly,
strongly_connected_graph is a black box to me -- I'm not sure if its doing a
DFS versus BFS or ?.
But what is clear is that this scales exponentially -- O(n^2.3) or so, based
on a few test sets) -- in time versus the total number of input edges. This
is fine for a few smaller edge lists that I have, but my eventual goal is to
do this for a set with ~50,000 vertices and ~2 million edges total. Based
on those test sets I mentioned, this size dataset will take ~7000 hours to
run on my current system, which obviously isn't reasonable -- and that's not
even accounting for things getting choked up in inputing that amount of data
with add_edge!
So my question is, am I approaching this problem with the right tools? Is
there another way within the Graph module to do this more efficiently?
Thanks,
I.R.
Re: perl graph theory (using Graph::Directed)
: I'm fairly new to perl and graph theory, so hopefully this question
: isn't too ignorant on both counts. I'm trying to input basically an
: edge list for a graph, and get an output of the connected graphs that
: all of these edges comprise. [...]
Do you want connected subgraphs or strongly connected? You say
connected here, but you're searching for strongly connected subgraphs
with Graph::Base::strongly_connected_graph.
You say you're new to graph theory, so I'll give quick overviews in
case you don't know the difference. A connected graph (perhaps called
weakly connected for emphasis) is a graph such that for each vertex v,
v is reachable from some other vertex w. A strongly connected graph is
one such that for each pair of vertices v and w, there's a path from v
to w and a path from w to v.
Below is a rendering in Perl of the algorithm from Baase that
enumerates weakly connected subgraphs. The author reports time
complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
number of vertices and m is the number of edges.
#! /usr/local/bin/perl
use warnings;
use strict;
# from Baase's *Computer Algorithms*, 2nd ed., pg. 180
sub dfs {
my $v = shift;
my $adj = shift;
my $mark = shift;
my $tag = shift;
$mark-> = $tag;
while ($adj-> && @{ $adj-> }) {
my $w = shift @{ $adj-> };
dfs($w,$adj,$mark,$tag) unless $mark->;
}
}
sub concomps {
# destructive, so copy
my %adj = map ref($_) ? [ @$_ ] : $_, %{ shift @_ };
my @v = @_;
my %mark;
my $componentNumber = 0;
foreach my $v (@v) {
unless ($mark) {
++$componentNumber;
dfs $v, \%adj, \%mark, $componentNumber;
}
}
my %subgraph;
for (keys %mark) {
push @{ $subgraph} }, $_;
}
values %subgraph;
}
## main
my %adj;
my %v;
while (<DATA>) {
my($v,$w) = split;
# treat graph as undirected
push @{ $adj } => $w;
push @{ $adj } => $v;
++$v;
++$v;
}
print join("," => map join("+", sort @$_), concomps \%adj, keys %v), "\n";
__DATA__
A B
A C
B A
D E
The author used arrays, and the code above uses hashes to be more
Perlish but with time and space penalties. If the size of your input
makes you consider switching back to arrays, you might also consider
switching to a lower-level language.
Hope this helps,
Greg
--
Those who wish to preserve freedom should recognize, however, that inflation
is probably the most important single factor in that vicious circle wherein
one kind of government action makes more and more government control necessary.
-- F. A. Hayek
Re: perl graph theory (using Graph::Directed)
You're exactly right, and I can see how strongly_connected_graph could
really eat up a substantial amount of time. That script you gave works
seamlessly and gave exactly what I was after with about 2 lines of
modification -- and it did so around 100x faster than my previous attempt!
Also picked up a copy of the book at the local library (1st edition, but it
still has an informative section on graph algorithms).
Hugely obliged,
I.R.
>
> : I'm fairly new to perl and graph theory, so hopefully this question
> : isn't too ignorant on both counts. I'm trying to input basically an
> : edge list for a graph, and get an output of the connected graphs that
> : all of these edges comprise. [...]
>
> Do you want connected subgraphs or strongly connected? You say
> connected here, but you're searching for strongly connected subgraphs
> with Graph::Base::strongly_connected_graph.
>
> You say you're new to graph theory, so I'll give quick overviews in
> case you don't know the difference. A connected graph (perhaps called
> weakly connected for emphasis) is a graph such that for each vertex v,
> v is reachable from some other vertex w. A strongly connected graph is
> one such that for each pair of vertices v and w, there's a path from v
> to w and a path from w to v.
>
> Below is a rendering in Perl of the algorithm from Baase that
> enumerates weakly connected subgraphs. The author reports time
> complexity of Theta(max(n,m)) with space Theta(n+m), where n is the
> number of vertices and m is the number of edges.
>
> #! /usr/local/bin/perl
>
> use warnings;
> use strict;
>
> # from Baase's *Computer Algorithms*, 2nd ed., pg. 180
>
> sub dfs {
> my $v = shift;
> my $adj = shift;
> my $mark = shift;
> my $tag = shift;
>
> $mark-> = $tag;
>
> while ($adj-> && @{ $adj-> }) {
> my $w = shift @{ $adj-> };
>
> dfs($w,$adj,$mark,$tag) unless $mark->;
> }
> }
>
> sub concomps {
> # destructive, so copy
> my %adj = map ref($_) ? [ @$_ ] : $_, %{ shift @_ };
> my @v = @_;
>
> my %mark;
>
> my $componentNumber = 0;
>
> foreach my $v (@v) {
> unless ($mark) {
> ++$componentNumber;
>
> dfs $v, \%adj, \%mark, $componentNumber;
> }
> }
>
> my %subgraph;
> for (keys %mark) {
> push @{ $subgraph} }, $_;
> }
>
> values %subgraph;
> }
>
> ## main
> my %adj;
> my %v;
>
> while (<DATA>) {
> my($v,$w) = split;
>
> # treat graph as undirected
> push @{ $adj } => $w;
> push @{ $adj } => $v;
>
> ++$v;
> ++$v;
> }
>
> print join("," => map join("+", sort @$_), concomps \%adj, keys %v),
"\n";
>
> __DATA__
> A B
> A C
> B A
> D E
>
> The author used arrays, and the code above uses hashes to be more
> Perlish but with time and space penalties. If the size of your input
> makes you consider switching back to arrays, you might also consider
> switching to a lower-level language.
>
> Hope this helps,
> Greg
> --
> Those who wish to preserve freedom should recognize, however, that
inflation
> is probably the most important single factor in that vicious circle
wherein
> one kind of government action makes more and more government control
necessary.
> -- F. A. Hayek
Site Timeline
- » FAQ 1.3 Which version of Perl should I use?
- — Next thread in » PERL Discussions
- » clearing of all variables
- — Previous thread in » PERL Discussions
- » s suffix question
- — Newest thread in » PERL Discussions
- » Dell Battery Slice LED codes
- — The site's Newest Thread. Posted in » Laptop Computers Forum