# (beginner question) downsample data points using chained maps

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
I have a vector of numeric points specified as 2 cols in a text file.
I want to break the points into windows and compute averages for each
window.

And (this is important), I am looking for a compact perlish way to do
it, ideally using the more "functional programming" parts of the
language, as opposed to the direct, nested for loops.

I am new to Perl, so I would appreciate feedback on the approach I
have taken below.

#!/usr/bin/perl
\$w = shift; # this is the block size
die "must specify window size" if !\$w;
print "@\$_\n" for map {
(\$s0,\$s1)=(0,0);
for(@\$_) {
\$s0 += @\$_[0];
\$s1 += @\$_[1]
};
[\$s0/\$w, \$s1/\$w]
} map {
[@x[\$_*\$w .. (\$_+1)*\$w-1]]
} (0..\$#x/\$w);

# In pseudocode, this is what it does:
for(i = 0; i < n; i+= width) {
(x,y) = (0,0);
for(int j = 0; j < width; j++) {
x += vec[i+j][0];
y += vec[i+j][1];
}
push results, [x/width, y/width];
}

Can anyone suggest an even more compact or idiomatically-correct way
to do this?

P.S.,
I am evaluating using Perl to do some of the data manipulation and
analysis that I currently perform in Mathematica.  In Mathematica,
lisp-ish mapping of anonymous functions is really common (and
convenient), so I am trying to decide if similar transformations can
be expressed compactly in perl.

## Re: (beginner question) downsample data points using chained maps

> I have a vector of numeric points specified as 2 cols in a text file.
> I want to break the points into windows and compute averages for each
> window.
>
> And (this is important), I am looking for a compact perlish way to do

Why is this *important*?  A prototype that does what you want, even
in a less than elegant way, would be worth more at this point.

> it, ideally using the more "functional programming" parts of the
> language, as opposed to the direct, nested for loops.
>
> I am new to Perl, so I would appreciate feedback on the approach I
> have taken below.

pedestrian and simple.  When it *works*, you can transform it into
tight code, if you want.

> #!/usr/bin/perl

No warnings, no strict.  Add them, and the variable declarations that
are then necessary.

> \$w = shift; # this is the block size
> die "must specify window size" if !\$w;

What is this sort for?  It sorts by the number of items in each line,
which is 2 every time by your own specification.

> print "@\$_\n" for map {
>  (\$s0,\$s1)=(0,0);
>  for(@\$_) {
>   \$s0 += @\$_[0];
>   \$s1 += @\$_[1]
>  };
>  [\$s0/\$w, \$s1/\$w]

If \$w doesn't divide the number of lines, you may end up with a
final window of fewer than \$w points.  It would be better to
divide by the actual number of points instead of \$w.

> } map {
>  [@x[\$_*\$w .. (\$_+1)*\$w-1]]

This is fragile if \$w doesn't divide the number of records.  splice()
would be a better tool.

> } (0..\$#x/\$w);

When I run this it prints out a series of pairs of zeroes for me
(the right number of pairs, but zero).  I'm not going to debug it.

> # In pseudocode, this is what it does:
> for(i = 0; i < n; i+= width) {
>  (x,y) = (0,0);
>  for(int j = 0; j < width; j++) {
>    x += vec[i+j][0];
>    y += vec[i+j][1];
>  }
> push results, [x/width, y/width];
> }

It would have been better to make this a working Perl solution first.

> Can anyone suggest an even more compact or idiomatically-correct way
> to do this?

Here is how I would do it:

my @data = <DATA>;
my @res;
while ( @data ) {
my @chunk = splice @data, 0, \$w;
my ( \$x_mean, \$y_mean);
for ( @chunk ) {
my ( \$x, \$y) = split;
\$x_mean += \$x;
\$y_mean += \$y;
}
push @res, [ \$x_mean/@chunk, \$y_mean/@chunk];
}
print "@\$_\n" for @res;

Now, if you want, you can rework some of the loops as map()s:

my @data = <DATA>;
print "@\$_\n" for map {
my ( \$x, \$y);
for ( @\$_ ) {
\$x += \$_->[ 0];
\$y += \$_->[ 1];
}
[ \$x/@\$_, \$y/@\$_];
} map [ map [ split], splice( @data, 0, \$w)], 0 .. \$#data/\$w;

Anno

## Re: (beginner question) downsample data points using chained maps

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in message
> > I have a vector of numeric points specified as 2 cols in a text file.
> > I want to break the points into windows and compute averages for each
> > window.
> >
> > And (this is important), I am looking for a compact perlish way to do
>
> Why is this *important*?  A prototype that does what you want, even
> in a less than elegant way, would be worth more at this point.
>
Because this is just a toy problem I chose to understand the power of
the language for working with this kind of data (lists of
n-dimensional numbers).  I am especially interested in Perl's support
for anonymous functions because I have found mappping of anonymous
functions to be a really convenient way to interact with this kind of
data in other languages.

Ultimately, what I am interested in is the type of transformations
that I can compactly express on the command line (using perl -e)
instead of from within a specific math environment.

I hate the context shifts that come from
1.  c++ program produces data
2.  start math software, load data, transform,plot,transform,plot,...
3.  modify c++ program and goto 1

I am hoping to move to something more like:
c++ program | perl -e 'some transformation' | plot
for at least some fraction of my work.

> > it, ideally using the more "functional programming" parts of the
> > language, as opposed to the direct, nested for loops.
> >
> > I am new to Perl, so I would appreciate feedback on the approach I
> > have taken below.
>
> pedestrian and simple.  When it *works*, you can transform it into
> tight code, if you want.
>
> > #!/usr/bin/perl
>
> No warnings, no strict.  Add them, and the variable declarations that
> are then necessary.
>
> > \$w = shift; # this is the block size
> > die "must specify window size" if !\$w;
>
> What is this sort for?  It sorts by the number of items in each line,
> which is 2 every time by your own specification.
>
> > print "@\$_\n" for map {
> >  (\$s0,\$s1)=(0,0);
> >  for(@\$_) {
> >   \$s0 += @\$_[0];
> >   \$s1 += @\$_[1]
> >  };
> >  [\$s0/\$w, \$s1/\$w]
>
> If \$w doesn't divide the number of lines, you may end up with a
> final window of fewer than \$w points.  It would be better to
> divide by the actual number of points instead of \$w.
>
> > } map {
> >  [@x[\$_*\$w .. (\$_+1)*\$w-1]]
>
> This is fragile if \$w doesn't divide the number of records.  splice()
> would be a better tool.
>
> > } (0..\$#x/\$w);
>
> When I run this it prints out a series of pairs of zeroes for me
> (the right number of pairs, but zero).  I'm not going to debug it.
>
> > # In pseudocode, this is what it does:
> > for(i = 0; i < n; i+= width) {
> >  (x,y) = (0,0);
> >  for(int j = 0; j < width; j++) {
> >    x += vec[i+j][0];
> >    y += vec[i+j][1];
> >  }
>  push results, [x/width, y/width];
> > }
>
> It would have been better to make this a working Perl solution first.
>
> > Can anyone suggest an even more compact or idiomatically-correct way
> > to do this?
>
> Here is how I would do it:
>
>     my @data = <DATA>;
>     my @res;
>     while ( @data ) {
>         my @chunk = splice @data, 0, \$w;
>         my ( \$x_mean, \$y_mean);
>         for ( @chunk ) {
>             my ( \$x, \$y) = split;
>             \$x_mean += \$x;
>             \$y_mean += \$y;
>         }
>         push @res, [ \$x_mean/@chunk, \$y_mean/@chunk];
>     }
>     print "@\$_\n" for @res;
>
> Now, if you want, you can rework some of the loops as map()s:
>
>     my @data = <DATA>;
>     print "@\$_\n" for map {
>         my ( \$x, \$y);
>         for ( @\$_ ) {
>             \$x += \$_->[ 0];
>             \$y += \$_->[ 1];
>         }
>         [ \$x/@\$_, \$y/@\$_];
>     } map [ map [ split], splice( @data, 0, \$w)], 0 .. \$#data/\$w;
>
> Anno

Thanks, this is great.

-- Oscar