- Oscar Stiffelman
December 3, 2004, 1:30 am
I want to break the points into windows and compute averages for each
window.
And (this is important), I am looking for a compact perlish way to do
it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.
I am new to Perl, so I would appreciate feedback on the approach I
have taken below.
#!/usr/bin/perl
$w = shift; # this is the block size
die "must specify window size" if !$w;
print "@$_\n" for map {
($s0,$s1)=(0,0);
for(@$_) {
$s0 += @$_[0];
$s1 += @$_[1]
};
[$s0/$w, $s1/$w]
} map {
[@x[$_*$w .. ($_+1)*$w-1]]
} (0..$#x/$w);
# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];
}
Can anyone suggest an even more compact or idiomatically-correct way
to do this?
P.S.,
I am evaluating using Perl to do some of the data manipulation and
analysis that I currently perform in Mathematica. In Mathematica,
lisp-ish mapping of anonymous functions is really common (and
convenient), so I am trying to decide if similar transformations can
be expressed compactly in perl.
Re: (beginner question) downsample data points using chained maps
> I want to break the points into windows and compute averages for each
> window.
>
> And (this is important), I am looking for a compact perlish way to do
Why is this *important*? A prototype that does what you want, even
in a less than elegant way, would be worth more at this point.
> it, ideally using the more "functional programming" parts of the
> language, as opposed to the direct, nested for loops.
>
> I am new to Perl, so I would appreciate feedback on the approach I
> have taken below.
You're going about this the wrong way. Start out with something
pedestrian and simple. When it *works*, you can transform it into
tight code, if you want.
> #!/usr/bin/perl
No warnings, no strict. Add them, and the variable declarations that
are then necessary.
> $w = shift; # this is the block size
> die "must specify window size" if !$w;
What is this sort for? It sorts by the number of items in each line,
which is 2 every time by your own specification.
> print "@$_\n" for map {
> ($s0,$s1)=(0,0);
> for(@$_) {
> $s0 += @$_[0];
> $s1 += @$_[1]
> };
> [$s0/$w, $s1/$w]
If $w doesn't divide the number of lines, you may end up with a
final window of fewer than $w points. It would be better to
divide by the actual number of points instead of $w.
> } map {
> [@x[$_*$w .. ($_+1)*$w-1]]
This is fragile if $w doesn't divide the number of records. splice()
would be a better tool.
> } (0..$#x/$w);
When I run this it prints out a series of pairs of zeroes for me
(the right number of pairs, but zero). I'm not going to debug it.
> # In pseudocode, this is what it does:
> for(i = 0; i < n; i+= width) {
> (x,y) = (0,0);
> for(int j = 0; j < width; j++) {
> x += vec[i+j][0];
> y += vec[i+j][1];
> }
> push results, [x/width, y/width];
> }
It would have been better to make this a working Perl solution first.
> Can anyone suggest an even more compact or idiomatically-correct way
> to do this?
Here is how I would do it:
my @data = <DATA>;
my @res;
while ( @data ) {
my @chunk = splice @data, 0, $w;
my ( $x_mean, $y_mean);
for ( @chunk ) {
my ( $x, $y) = split;
$x_mean += $x;
$y_mean += $y;
}
push @res, [ $x_mean/@chunk, $y_mean/@chunk];
}
print "@$_\n" for @res;
Now, if you want, you can rework some of the loops as map()s:
my @data = <DATA>;
print "@$_\n" for map {
my ( $x, $y);
for ( @$_ ) {
$x += $_->[ 0];
$y += $_->[ 1];
}
[ $x/@$_, $y/@$_];
} map [ map [ split], splice( @data, 0, $w)], 0 .. $#data/$w;
Anno
Re: (beginner question) downsample data points using chained maps
> > I have a vector of numeric points specified as 2 cols in a text file.
> > I want to break the points into windows and compute averages for each
> > window.
> >
> > And (this is important), I am looking for a compact perlish way to do
>
> Why is this *important*? A prototype that does what you want, even
> in a less than elegant way, would be worth more at this point.
>
Because this is just a toy problem I chose to understand the power of
the language for working with this kind of data (lists of
n-dimensional numbers). I am especially interested in Perl's support
for anonymous functions because I have found mappping of anonymous
functions to be a really convenient way to interact with this kind of
data in other languages.
Ultimately, what I am interested in is the type of transformations
that I can compactly express on the command line (using perl -e)
instead of from within a specific math environment.
I hate the context shifts that come from
1. c++ program produces data
2. start math software, load data, transform,plot,transform,plot,...
3. modify c++ program and goto 1
I am hoping to move to something more like:
c++ program | perl -e 'some transformation' | plot
for at least some fraction of my work.
> > it, ideally using the more "functional programming" parts of the
> > language, as opposed to the direct, nested for loops.
> >
> > I am new to Perl, so I would appreciate feedback on the approach I
> > have taken below.
>
> You're going about this the wrong way. Start out with something
> pedestrian and simple. When it *works*, you can transform it into
> tight code, if you want.
>
> > #!/usr/bin/perl
>
> No warnings, no strict. Add them, and the variable declarations that
> are then necessary.
>
> > $w = shift; # this is the block size
> > die "must specify window size" if !$w;
>
> What is this sort for? It sorts by the number of items in each line,
> which is 2 every time by your own specification.
>
> > print "@$_\n" for map {
> > ($s0,$s1)=(0,0);
> > for(@$_) {
> > $s0 += @$_[0];
> > $s1 += @$_[1]
> > };
> > [$s0/$w, $s1/$w]
>
> If $w doesn't divide the number of lines, you may end up with a
> final window of fewer than $w points. It would be better to
> divide by the actual number of points instead of $w.
>
> > } map {
> > [@x[$_*$w .. ($_+1)*$w-1]]
>
> This is fragile if $w doesn't divide the number of records. splice()
> would be a better tool.
>
> > } (0..$#x/$w);
>
> When I run this it prints out a series of pairs of zeroes for me
> (the right number of pairs, but zero). I'm not going to debug it.
>
> > # In pseudocode, this is what it does:
> > for(i = 0; i < n; i+= width) {
> > (x,y) = (0,0);
> > for(int j = 0; j < width; j++) {
> > x += vec[i+j][0];
> > y += vec[i+j][1];
> > }
> push results, [x/width, y/width];
> > }
>
> It would have been better to make this a working Perl solution first.
>
> > Can anyone suggest an even more compact or idiomatically-correct way
> > to do this?
>
> Here is how I would do it:
>
> my @data = <DATA>;
> my @res;
> while ( @data ) {
> my @chunk = splice @data, 0, $w;
> my ( $x_mean, $y_mean);
> for ( @chunk ) {
> my ( $x, $y) = split;
> $x_mean += $x;
> $y_mean += $y;
> }
> push @res, [ $x_mean/@chunk, $y_mean/@chunk];
> }
> print "@$_\n" for @res;
>
> Now, if you want, you can rework some of the loops as map()s:
>
> my @data = <DATA>;
> print "@$_\n" for map {
> my ( $x, $y);
> for ( @$_ ) {
> $x += $_->[ 0];
> $y += $_->[ 1];
> }
> [ $x/@$_, $y/@$_];
> } map [ map [ split], splice( @data, 0, $w)], 0 .. $#data/$w;
>
> Anno
Thanks, this is great.
-- Oscar
