# Help with understanding references to subroutines

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
Hi folks,

I'm doing a short online tutorial on perl to try and get my skill level up
from novice to whatever the next level might be!!

So far it's all been relatively straight forwards, except that I've got to
a lesson on references to subroutines.

I can cope, after a fashion, with references to arrays, hashes, and even
scalars, but I can't bend my mind around references to subroutines.

This is the entire tutorial page:

========================== >8 ==========================

Reference to a subroutine

Up to now we've devoted our efforts to understanding and making use of
references to scalar, array and hash variables. In this section we show
that you can have references to subroutines as well. With this, we can
pass subroutines around as variables and arguments to other subroutines.
The syntax is slightly different, but hopefully the next exercise will
make it all clear.

Just going through the motions:

1. Two ways of creating a reference to a scalar:

my \$scalar = 'Foo';
my \$rs_foo = \$scalar;

b) Directly create a reference to an anonymous scalar

my \$rs_foo = 'Foo';

2. Two ways of creating a reference to an array:

my @arr = (1,2,3);
my \$r_arr = \@arr;

b) Directly create a reference to an anonymous array

my \$r_arr = [1,2,3];

3. Two ways of creating a reference to a hash:

my %hash = ( foo => 1, bar => 2 );
my \$r_hash = \%hash;

b) Directly create a reference to an anonymous hash

my \$r_hash = { foo => 1, bar => 2 };

Finally,

4. Two ways to create a reference to a subroutine:

sub foo { return 'bar'; }
my \$r_sub = \&foo;

b) Directly create a reference to an anonymous subroutine

my \$r_sub = sub { return 'bar'; };

In either case to call it, just type

&\$r_sub();

Notes:

For a more in-depth discussion, see IP Ch7
If you progress to using Perl for web development, you will use
subroutine references extensively for implementing the website controller
using frameworks such as Catalyst or Dancer.

Exercise

Save the code below as call_subroutine_reference and where it says ###
insert code here ### get the subroutine reference from the arguments, run
it and store the returned value into \$retval

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dump 'pp';
use feature 'say';

my \$r_foo = sub { return "foo"; };
my \$r_bar = sub { return {this => 1, that => 2}; };

sub call_sub {

### Insert code here ###
# \$retval is what the subroutine returns

return \$retval;
}

say 'r_foo returns '.pp(call_sub(\$r_foo));
say 'r_bar returns '.pp(call_sub(\$r_bar));

The output should be:

r_foo returns "foo"
r_bar returns { that => 2, this => 1 }

========================== 8< ==========================

From this I created the subroutine code as:

sub call_sub {
my \$data = shift;
my \$retval;
if (\$data == \$r_foo)
{
\$retval = &\$r_foo;
}
else
{
\$retval = &\$r_bar;
}

return \$retval;
}

When I run it, I get exactly what I want to get, but it seems to me to be
an excessively awkward way to decide which of two subroutines you want to
call.

What is the purpose of the subroutine reference?
Why does call_sub even need to exist?
I cannot see the purpose of passing a reference to a subroutine around,
why could the code not simply say:

say 'r_foo returns '. &\$r_foo;
say 'r_bar returns '. &\$r_bar;

After all, if you are calling call_sub(\$r_foo) then you must /know/ you
are calling r_foo.

Even, why have them as references to anonymous subroutines in the first
place?

I'm obviously missing something fairly fundamental here, so any help to
relieve the pressure on my poor suffering brain would be most appreciated.

Any idea what IP chapter 7 might refer to? I used to work for the UKIPO,
so to me IP (without 'address') means Intellectual Property.

I was going to ask why Catalyst or Dancer would /require/ the use of
references to subroutines, but I think my brain already hurts enough as it
is.

Dave

--
Dave Stratford - ZFCB
http://daves.orpheusweb.co.uk/

## Re: Help with understanding references to subroutines

[...]

Calling a subroutine via & has the two 'special effects' of bypassing
any prototype checking and reusing the current @_ for its
arguments. This is mostly useful for 'call-forwarding'. Assuming that
\$r_sub contains a sub reference, it can also be invoked as

\$r_sub->(arg0, arg1, ...)

[...]

[...]

Contrived example:

-------------
my %ops = (
'+' => sub { \$_[0] + \$_[1]},
'-' => sub { \$_[0] - \$_[1]},
'*' => sub { \$_[0] * \$_[1]},
'/' => sub { \$_[0] / \$_[1]},
'^' => sub { \$_[0] ** \$_[1]});

sub calc
{
my (\$op, \$a, \$b) = @_;
printf("The result is %f\n", \$op->(\$a, \$b));
}

my \$in;

while (1) {
print("Enter a term: ");

\$in = <STDIN>;
\$in // last;

\$in =~ /^\s*(\d+)\s*([-+^*\/])\s*(\d+)\s*\$/ or do {
print STDERR ("\tWTF??\n\n");
next;
};

calc(\$ops, \$1, \$3);
}

print("\n");
-------------

This reads terms of the form

<integer> <operator> <integer>

(operators are +, -, *, / an ^) from stdin, computes the result and
prints it.

## Re: Help with understanding references to subroutines

[...]

Additional remark: Calling a function like this is usually not a good
idea because the next successful regex-match will change the values of
\$1, \$2 and \$3, thus possibly changing the arguments to the function as
unintended side effect (in case the called subroutine accesses its
arguments via @_ instead of copying them into my-variables).

## Re: Help with understanding references to subroutines

Yes, that's awkward, and missing the point.

Let's simplify it a bit:

Here \$data == \$r_foo, so we can replace this line with

\$retval = &\$data;

And here \$data == \$r_bar, so we can replace this line with

\$retval = &\$data;

So, now both branches are identical and we can remove the if:

sub call_sub {
my \$data = shift;
my \$retval;
\$retval = &\$data;
return \$retval;
}

Looks a lot simpler, doesn't it? And also more general: call_sub() can
now call ANY subroutine passed to it, not just \$r_foo and \$r_bar.

I'm not sure if I understand the question. The purpose of subroutine
references is to refer to subroutines, so that they can be stored in
variables, passed as arguments to other functions, etc.

To show you how to pass subroutine references around and how to call
them. It's example code. It doesn't do anything useful.

You could do that. You could also use named subs and call those. But
then there wouldn't be any references involved and it wouldn't be much
use as an example.

Yes, but only where you call call_sub(). The function call_sub() itself
doesn't know anything about \$r_foo. It just knows that it gets a
subroutine reference which it is supposed to call.

You are probably missing it because the example is so useless. In a real
programm call_sub() would do something interesting, not just call a sub
and return the result.

For example, in the File::Find module, the find function recursively
searches a directory tree and calls the function for every directory
entry it finds. Or the builtin sort calls the comparison function for
each pair of elements it wants to compare (it's just a block, not a sub,
but that's just syntactic sugar).

Being able to use a function reference in these cases is very useful:
You don't have to rewrite directory walking code just because you want
to do something different to the files it finds - just pass in a
different "wanted" function. You don't have to rewrite sort just because
you want to sort by different criteria - just pass in a different
comparison function.

http://shop.oreilly.com/product/9780596102067.do I guess.

Because they need to call the functions you provide. It's very similar
to what File::Find or sort do: They provide a framework, but you need to
fill in the details.

hp

--
_  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel

## Re: Help with understanding references to subroutines

I know this is taken from the tutorial mentioned, but we (TINW) should
really be encouraging people to call subrefs like this

\$retval = \$data->();

As Rainer pointed out, the &-form for calling subs can have
side-effects; in the particular case of calling through a reference,
prototypes don't apply, but if you write the call as

\$retval = &\$data;

rather than

\$retval = &\$data();

\$data will inherit the current @_, which is obviously a Bad Thing. This
mistake is impossible to make with the \$data->() form of call (and it
matches nicely with the \$ref->{} forms of array/hash dereference).

[...]

The concept of higher-order programming (that subs can be treated as
data values) is not a trivial one. It's not entirely surprising the OP
is having trouble with it. The most convenient example for explaining
the point of subrefs is using them for callbacks, like the File::Find
example you give below.

[It's not, actually; sort blocks are rather more complicated than a
simple sub ref.]

Ben

## Re: Help with understanding references to subroutines

[...]

I beg to differ here: Accidentally writing code whith this property is
very likely a 'Bad Thing', however, the feature itself is quite useful
for building subroutine processing pipelines where each individual sub
performs 'some subtask of something' which possibly involves modifying
@_ and then 'passes the buck' to another subroutine withg &-call.

## Re: Help with understanding references to subroutines

[...]

In that contrived example, yes.

Your are right, for simple examples references to functions are rarely
all that useful. However they begin to make a lot of sense as soon as
you are starting to talk about higher order functions, i.e. functions
that take functions as arguments. They are extremely powerful although
many developers never understand them. Some examples (free-style
notation):

filter (aka grep): takes a list of items and a custom function which
returns true or false for each item of the list; returns the list of
items for which the custom function yields true.

map: takes a list of items and a custom function; this function is
applied to each item in the list

reduce: takes a list of items L and custom function OP; then this
operation is applied between all items of the list until only a single
element is left
L[0] OP L[1] OP L[2] OP L[3] ..... L[n]
If OP happens be + then you get the sum of all elements, if it is * then
you get the product of all elements, if it is a custom function
returning the larger of both arguments then you get the maximum value of
the list, etc, etc.

sort: takes a list of items and a custom function which for any two
items decides if the left item or the right item is larger or if both
are equal.

The beauty of these HOFs is that they don't care about the type of item.
They are generic and will work on lists of numbers, lists of strings,
lists of personel records, lists of whatever, because the handling of
the actual item is left to the custom function which is user-supplied as
an argument. And therefore I don't need to implement 5 dozen sort
functions only because I need to sort lists of 5 dozen different item
types.

jue

## Re: Help with understanding references to subroutines

<snip>

Thanks for all the help and advice guys. I'm still not 100% sure I
understand it enough, but at least I do know that there is a reason.

Dave

## Re: Help with understanding references to subroutines

I can also provide you with a simple, real example although I can't
quote the code since it belongs to my employer: I'm presently working on
a mod_perl-based 'web application' (in the sense that it is supposed to
work in response to a HTTP GET request) which is supposed to create
PDF-documents containing QR-code images supposed to be used for
interacting with another web application. Creating these images requires
connecting to a database server and querying various pieces of
information from it, possibly, a lot of pieces (tenthousands). Since
this will be an autonomously operating piece of software, it needs to be
able to deal with database connections suddenly breaking down, ie,
because the DBMS was restarted as part of a system update. The way this
works is as follows:

There's a low-level database interface layer which executes queries on
behalf of the application. In case of database errors, these are
classified as either irrecoverable or transient and an exception is
thrown (via die) containing information about the error and its type.

Then, there's a mid-level 'execution' layer which takes a reference to a
subroutine as argument. This subroutine is supposed to perform the
actual application work. All of its state information is to be stored in
my-variables created by it and any other subroutines it might need to
call. The executor invokes this subroutine in an eval block and catches
exceptions thrown by the database interface layer. In case the error is
considered to be recoverable, it kills the database connection (if it
still exists), destroys the (DBI) database handle object, waits for a
random, short time and starts over.

The 'application subroutines' (there's actually more than one)
themselves just perform their work without any concern for problems
which could occur while interacting with the database.

total size of this is 132 lines of code. The fact that some
CPAN-codeblob with presumably more than 100 times this size exists which
hopefully(!) provides this as an also-feature among the toaster,
lawnmower, hovercraft, big game trap, mail reader, howitzer and
pencil-sharpener also rolled into it is not an excuse for using that.

Oh, do shut up.

Ben

## Re: Help with understanding references to subroutines

Again, an argument in favour of "Whatever J. Random Bored Guy
uploaded to J. Random Software Rubbish Dump" must be used because it was
uploaded, regardless of any non-desirable properties it might have,
would be most welcome.

Oh, do shut up.

Ben

## Re: Help with understanding references to subroutines

http://en.wikipedia.org/wiki/Sturgeon%27s_Law

There are

- some CPAN modules I use because they provide functionality
orthogonal to 'the core of the application' which is also
needed but I don't desire to deal with it in more detail
except if there's a critical bug in the module

- some CPAN modules I use because they saved me a significant
amount of work despite they're earmarked for replacement
because of known deficiencies should time permit

- some CPAN modules I use because I consider them 'generally
well-designed and useful' for the purpose at hand, possibly
extended with features I happen to need the module author(s)
were less interested in

- some CPAN modules I wouldn't touch with a ten feet barge
pole, generalized event loops, kitchen sink abstractions
solving wildly differing problems, OO systems of any pedigree
(if I though the Perl OO system was seriously unusable, I
wouldn't be using Perl) and generally (or mostly) everything
which is supposed to solve 'programming problems' instead of
'real problems'

YMMV.

## Re: Help with understanding references to subroutines

RW> There are

people who understand what "Do shut up" means and (self-evidently)
people who do not.

We are all well-aware of your hobbyhorse.  Kindly stop riding it in
public.

Charlton

--
Charlton Wilbur
cwilbur@chromatico.net

## Re: Help with understanding references to subroutines

<snip code description>

That all sounds genuinely fascinating, especially the bit about creating
QR codes; but from your descripton, granted it's only a fairly high level
description, but are function refs actually necessary? At some point you
are having to decide which function to call, so why do you then need refs?
Couldn't you just call the function?

Sorry if this sounds a bit basic, but having programmed in a large number
of different, occasionally esoteric, languages for just over 30 years, I
can't think of an instance where this has even been possible before. I
have written a couple of largish cgi scripts in perl, and not needed to
use them.

Dave

## Re: Help with understanding references to subroutines

That's like saying "Why do you need refs/pointers/subscripts?
Couldn't you just set the variable itself?"  That's practially like

if (\$i == 1) {
\$a1 = \$x;
} elsif (\$i == 2) {
\$a2 = \$x;
} elsif (\$i == 3) {
\$a3 = \$x;
...

and not realizing that there's a problem with using a value that's not
hardcoded (what if you implement \$a1 through \$a100 and then need
\$a101?), and professing not to see any use for

\$a[\$i] = \$x;

Haven't you ever used Perl's sort, map, or grep?  I'm told they are
not quite sub references, but they look enough like them to be a
useful mental model for me.  How about \$SIG?  You give them sub
references.

What languages have you used?

In C, they're function pointers.  In standard libraries, they are used in
atexit: register a function to be called at program exit (like
Perl's \$SIG, I think)
bsearch, qsort: search and sort don't know the datatypes of what
they're sorting, so you pass in a comparison function
signal: register a signal handler function (%SIG)

In Fortran, they are EXTERNAL arguments.  I used them in math -- for
example, you have a Runge-Kutta solver but you need to pass it a
function that you're trying to integrate.

In Java, I think they tend to use interfaces, so I think you create a
class that implements the interface, and in that class have a method
with the standard name that does what you want.  You're not passing a
pointer to a method -- but you are passing a pointer to an object that
happens to call the particular method you want to run.

--
Tim McDaniel, tmcd@panix.com

## Re: Help with understanding references to subroutines

,----
| Then, there's a mid-level 'execution' layer which takes a reference to a
| subroutine as argument. This subroutine is supposed to perform the
| actual application work. All of its state information is to be stored in
| my-variables created by it and any other subroutines it might need to
| call. The executor invokes this subroutine in an eval block and catches
| exceptions thrown by the database interface layer. In case the error is
| considered to be recoverable, it kills the database connection (if it
| still exists), destroys the (DBI) database handle object, waits for a
| random, short time and starts over.
`----

A contrived mockup of that could look like this:

----------
sub operation() {
die("Shit happened\n") if rand(10) <= 7;
return rand(12);
}

sub execute
{
my \$sub = \$_[0];
my \$rc;

{
eval {
\$rc = \$sub->();
};

\$@ and do {
print STDERR ("\t** \$@");
redo;
};
}

return \$rc;
}

{
return operation() + 12;
}

sub random_sub
{
return operation() - 3;
}

printf("random sub returned %d\n", execute(\&random_sub));
----------

The operations I'm actually dealing with involve communication with an
external program (database server) which might be running on a different
computer located anywhere else in the world. There's an error recovery
algorithm jointly implemtented by 'operation' (which signals the error)
and 'execute' (which retries the failed application operation in case an
error was signalled) which has to be wrapped around a number of
different 'application operations', here represented as 'random_add' and
'random_sub'. The main application logic decides which subroutine to
call but the actual call itself is performed by an intermediate
subroutine.

## Re: Help with understanding references to subroutines

On 2014-01-09 16:32, Rainer Weikusat wrote:

Alternative:

eval {
\$rc = \$sub->();
1;  #success
}
or do {
my \$eval_error = \$@ || 'Zombie Error';
warn "\t** ", \$eval_error, "\n";
redo;
};

--
Ruud

## Re: Help with understanding references to subroutines

Better alternative:

(\$rc, \$err) = \$sub->();
if (\$err) {
print STDERR ("A catastrophe occurred!\n");
.
.
.
}

and implement run-of-the-mill C-style error checking and handling via
'magic return values' if you want that. Which I don't.

## Re: Help with understanding references to subroutines

DS> Thanks for all the help and advice guys. I'm still not 100% sure I
DS> understand it enough, but at least I do know that there is a reason.

it is much simpler than you realize. it is just a mindset change that
you need to make.

it is just that code can be treated like data (lisp is all about
that). so you can put code into data structures, pass code as arguments,
etc. a very common use of code refs is a dispatch table. it is a hash of
keys but the values are code references. this way you can choose which
sub to call based on some key. i am sure there are plenty of perl
examples all over usenet and beyond.

so a sub reference is just a scalar value that refers to a sub and which
can then be called. it can be created from a named sub or by an
anonymous sub (just like with named or anon hashes and arrays).

an example of a sub as an argument would be choosing different ways to
output something. the sub could be printing to a file, logging, etc. the
main sub is passed its args and a sub to control output. when it has
output it calls that sub and passes it stuff to output.

this is also called polymophism. each of the subs you can pass in must
(should) have the same api. that way the main sub doesn't have to do
anything special but use the single section of code to call the sub
ref. same with the above dispatch table. all of the subs in it should
take the same args and return the same type of value (if they return
something).

so think of sub refs as an elegant and efficient way to choose something
at a distance. instead of a long ugly if/then/else block you just need
to choose the correct sub (from the dispatch table or via passing it in
as an arg). like most software tricks it isn't NEEDED but very useful
and popular. but then again, most software things are just sugar over
assembler too. :)

uri