Efficiency questions: combined ifs and looping 4 times

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
I have a PHP Script which takes approximately 12 hours
to run on my Win XP Pro machine, spending the vast
majority of its time in the function below, to compute some
combinatorial properties, so I would like to squeeze
every bit of efficiency from it.

Respondents should probably know why ++\$var (pre
increment and decrement) is more efficient than \$var++
(post increment and decrement), though I've read that
the interpreter does the conversion automatically in
cases where the variable is isolated.

I have three specific questions:
(1)  The outer loop's only purpose is to loop 4 times.
What is the most efficient way to write it (I mean the
for statement)?

(2)  Is it more efficient to combine the two if statements
and omit the continue?

(3)  Does it make a difference in calling utilize whether I do:
\$aUtilRes = utilize(\$newNums, \$aUtilCopy=\$aUtil);
or
\$aUtilCopy = \$aUtil;
\$aUtilRes = utilize(\$newNums, \$aUtilCopy);

function utilize(\$strBoxes, &\$aRes = array()) {
// \$aRes is an array of arrays, with the following property assumed:
// If \$aRes[\$idx] = array(...), then \$idx = array_sum(\$aRes[\$idx])
//   Ie. \$idx is the sum of the entries of the array it indexes
// And each subarray contains from 1 to 4 entries.
// \$strBoxes is a space separated list of integers
// This function augments \$aRes by adding zero,one,two,three,
//   and four of each number in \$strBoxes to the subarrays of
//   \$aRes, subject to each subarray being limited to four entries.
//   In addition, if there are two ways to form a given sum, then
//   the way with fewer addends is used.

\$pos = strrpos(" \$strBoxes", " ");
\$box = 0 + substr(\$strBoxes,\$pos);    // Last num in \$strBoxes
// Everything but the last number:
\$strRest = substr(\$strBoxes,0,--\$pos<0 ? 0 : \$pos);

if (\$strRest) utilize(\$strRest, \$aRes);
for (\$i=1;\$i<=4;++\$i) {
foreach (\$aRes as \$amt => \$aBox) {
if ((\$size=sizeof(\$aBox))>=4) continue;
if (!\$aRes[\$next=\$amt+\$box] ||
sizeof(\$aRes[\$next])>\$size) {
\$aRes[\$next] = \$aRes[\$amt];
\$aRes[\$next][] = \$box; } } }

\$aRes[\$prod=\$box] = array(\$box);
\$aRes[\$prod+=\$box] = array(\$box, \$box);
if (!\$aRes[\$prod+=\$box] || sizeof(\$aRes[\$prod])>3)
\$aRes[\$prod] = array(\$box, \$box, \$box);
if (!\$aRes[\$prod+=\$box])
\$aRes[\$prod] = array(\$box, \$box, \$box, \$box);

return \$aRes;
}

Note: \$strBoxes is an ordered list of integers starting with 1
The function computes all possible sums of the numbers in
\$strBoxes using no more than four numbers (a number may
be repeated in the sum).  The function bogs down when
\$strBoxes holds 7 integers.

Cacheing is already in effect via \$aUtilCopy being passed in,
but the speedup is not great because the problem does not
decompose nicely so far as I can see.  utilize() is itself
wrapped within a few loops, so it gets called (I think) millions
of times.

Thanks,
Csaba Gabor from Vienna

Re: Efficiency questions: combined ifs and looping 4 times

Csaba Gabor wrote:

If it's only going to loop 4 times, it doesn't really matter.  The
difference will be in microseconds.

Perhaps, perhaps not.  All you can do is try it.

Same as above (although I wouldn't expect it to make much difference).

Personally, I wouldn't even try something like this in PHP.  I'd go to a
compiled language such as C.

As far as your speed issue - I suspect the slow speed is the string
parsing of \$strBoxes.  I would think this would be much faster as an
array.  However, what you really need to do is profile your script and
see where the performance bottleneck is.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Efficiency questions: combined ifs and looping 4 times

Csaba Gabor wrote:

Efficiency does not depend on your coding style. It depends hugely on the
algorithm choosen.

Well, if you want to attack that problem from a brute-force perspective,
you'll have an algorithm of O(n^4) complexity, which is pretty bad.

And not only that, but you're calling sizeof(), which has a cost of O(n),
growing the complexity up to O(n^5). Ouch.

Oooohhh! Recursive calls! sizeof() and strpos() inside them! I won't bother
calculating the numbers, but I guess you're in the lines of O(n^10) now.

Stop worring about caching and saving memory by using one less variable.
That is premature optimization, and you shouldn't be doing it. Grab a pen
and a piece of paper and RETHINK YOUR ALGORITHM. See if you can use a tree
for storing the data (transversing it completely will have an horrendous
complexity, though).

Cheers,
--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

You will pioneer the first Martian colony.

Re: Efficiency questions: combined ifs and looping 4 times

The kinds of optimizations you're talking about here are
microoptimizations, they'll buy you microseconds if anything at all.

Choice of algorithm always has a far bigger impact in performance than
any one implementation detail, unless you inadvertently pick an
implementation detail that happens to be very expensive. Other than
using sizeof () within a loop there's nothing that obviously jumps
out.  Use sizeof () before the loop, store the result in a var and use

The simple fact is, however, that the problem you've chosen to solve
is a computationally expensive one, the algorithm you've chosen to use
to solve said problem is brute-force, and you're doing it in an
interperated language which means you have the interperater's overhead
to take account of as well.

There's probably a better solution to this problem out there, try
googling for it. One thing I'd suggest right off is using a balanced
binary tree instead of an array.  Trees tend to be much faster than
sequential data structures for tasks like this, but implementing the
algorithm will be harder and require some thought.

And PHP is quite simply the wrong language for this job.  PHP was
intended to spit out web pages, not do heavy duty mathematics.  A
compiled language like C is bound to be faster simply because there's
no interperater overhead, at the very least you should be using Java
which is also going to be faster in this application than PHP.

Re: Efficiency questions: combined ifs and looping 4 times

...

the algorithm effects dominate, and any variations
in specific coding have not led to a measurable difference.

A note about sizeof().  sizeof()=3Dcount() is not
an inefficient function - the reason not to include it
in a loop is because of the overhead that it incurs as
a function call.  Nevertheless, I recoded everything
so that the subarrays always toted around their size
(as their 0th element) and did not find any significant
time difference.

Not likely, but I did try (though no doubt there are
faster implementations).  In fact there is a reference
to what I was working on here:
http://www.research.att.com/~njas/sequences/A001214
Note that only 10 terms of the sequence are listed:
4, 10, 26, 44, 70, 108, 162, 228, 310, 422
the final term being added in 2006.  That PHP only
bogs down at the 8th term (228) for me I thinks
speaks well of PHP.

It's clear that a compiled language will be faster
than an interpreted one, but PHP does pretty well
for itself, and development overhead is generally
far less.

Iv=E1n, your order of complexity analysis is off.
utilize() computes all possible 4-sums of a given
set, S, of n numbers.  The maximum sum
can be 4*max(S).  The algorithm works by first
taking one number, call it j=3Dmin(S), and
generating all 4-sums with that number (which
would be j, 2j, 3j, 4j).  At each subsequent
step, it plops in the next number, k, and tries to
generate all 4-sums (from all the previous 4-sums)
which it would do by adding k, 2k, and 3k to each
of the existing sums.  On any pass, this means at
most (3/4)max(S) entries to analyze.  Since there
are n passes, this means that the complexity is
O(n*max(S)).  The recursion is just the means to
the end and does not, of itself, represent
sizeof(...) does not affect the order of
complexity; at most it would be a constant factor.

The reason this bogs down is that it is getting
called repeatedly.  In essence it's being called
with all sorts of variations of 8 numbers.

I don't see that this problem is amenable to the
form that you are suggesting.  In order for it to
be a workable idea, there has to be a way to combine
the subtree results, and these types of problems
(integer sums) are notorious for their intrasigence.

Yes, if I wanted to go further with this, then I'd
be obliged to switch.  As it was, it was a diversion.
However, I did find a 25% speedup by realizing that
I was looping once too oft in the outer (now inner)
loop.  I recoded it to be:

function utilize(\$strBoxes, \$aRes = array()) {
// See thread start for doc:
8bbdd5c/

\$pos = strrpos(" \$strBoxes", " ");
// Last number in \$strBoxes,
// 0+ to avoid repeated coersion
\$box = 0+substr(\$strBoxes,\$pos);
// Everything but the last number:
\$strRest = substr(\$strBoxes,0,--\$pos<0 ? 0 : \$pos);

if (\$strRest) \$aRes = utilize(\$strRest, \$aRes);
\$aRes[\$box] = array(\$box);
foreach (\$aRes as \$amt => \$aBox)
for (\$i=3D0,\$size = sizeof(\$aBox);++\$i<=3D4-\$size;) {
if (!\$aRes[\$amt+=3D\$aBox[]=3D\$box] ||
sizeof(\$aRes[\$amt])>\$size+\$i)
\$aRes[\$amt] = \$aBox;
else break;  }

return \$aRes;  }

Re: Efficiency questions: combined ifs and looping 4 times

Csaba  Gabor wrote:

It *is*. It has a complexity of O(n) which adds to the overall complexity,
either you want it or not.

I don't understand what a "4-sum" is.

Are you trying to solve the postage stamp problem? The coin problem? Any
problem with a known name?

Can you provide a description of what is needed, some sample input and some
sample output?

Then let us look at the problem *as a whole*.

--
----------------------------------
Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

Organización: nada funciona, pero todo el mundo sabe porque.

Re: Efficiency questions: combined ifs and looping 4 times

On Apr 16, 6:49 pm, Iv=E1n S=E1nchez Ortega <ivansanchez-...@rroba-
escomposlinux.-.punto.-.org> wrote:

Can you substantiate that claim?  I found the original
sizeof() remarks by you and Gordon interesting enough
that I did some testing to determine whether or not
sizeof must iterate through the entire array or not.

Following the tests outlined at http://php.net/count
I did a run through and noted the times.  Then I
doubled the size of the array and ALL three of the
loops, including
for (\$i=3D0;\$i<sizeof(\$array);++\$i)
took twice as long.  That means that the time spent
in the sizeof() function is O(1) and not O(n).  To
be specific, sizeof(\$array) seems to not have to
loop through the array to count the number of items
in it (If it were the case that PHP were cacheing
the size information, then one would also expect
more than a doubling in time).

Over all iterations, the time added from the entire
loop is O(n), but the same is also true if one
stores the size since a comparison must be done
in any case.  Only the multiplicative constant is
different.

Yes.  And this cost is frequently significant.

By 4-sum I meant the sum of at most any 4 numbers
from S (with repeats allowed).

Yes, I was generating the values for the postage stamp
problem through size 8, with 4 stamps.

My outer function takes one number, the size of
the postage stamp problem.
Ie.
\$aBest = findBestStamps(8);
function findBestStamps(\$n) {
// finds a set, S, of \$n positive integers
// such that the first hole for all
// 4-sums of S is maximal
// A hole is a number that has no 4-sum in S