# How much space is used for hash elements? For array elements?

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
I am aggregating unique integers using a hash.  All I'm interested in
at the end is the number of unique elements.  It's along the lines of
this (not actual code, just expressing the concept):
my %timeshash = ();
for (...) {
... calculating \$time ...
\$timeshash = undef;
}
my \$count = scalar(keys(%timeshash));

Therefore, the value stored in the hash doesn't matter.
There are a large number of indices (\$time).  Is there any reason of
time or space to prefer storing undef versus 0 versus "" versus
anything else?

Similarly, what if I were dealing with an array, like if \$time were
not sparse?

--
Tim McDaniel, tmcd@panix.com

## Re: How much space is used for hash elements? For array elements?

On 22/10/2015 08:43, Tim McDaniel wrote:

my \$count=0;
while ( &something ) { ++\$count }

## Re: How much space is used for hash elements? For array elements?

And this implements the OP's requirement of
"the number of unique elements"
exactly how?

jue

## Re: How much space is used for hash elements? For array elements?

On 22/10/2015 18:45, J�rgen Exner wrote:

I do not tell you

## Re: How much space is used for hash elements? For array elements?

tmcd@panix.com (Tim McDaniel) writes:

Using undef means 'scalar with no allocated body', cf

[rw@doppelsaurus]~#perl -MDevel::Peek -e '\$times = undef; Dump(\%times)'
SV = IV(0x606bb0) at 0x606bc0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x623270
SV = PVHV(0x60bb00) at 0x623270
REFCNT = 2
FLAGS = (SHAREKEYS)
ARRAY = 0x61bc00  (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
RITER = -1
EITER = 0x0
Elt "h" HASH = 0x7003dfea
SV = NULL(0x0) at 0x606998
REFCNT = 1
FLAGS = ()

vs

[rw@doppelsaurus]~#perl -MDevel::Peek -e '\$times = 1; Dump(\%times)'
SV = IV(0x606bb0) at 0x606bc0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x623270
SV = PVHV(0x60bb00) at 0x623270
REFCNT = 2
FLAGS = (SHAREKEYS)
ARRAY = 0x61bc00  (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
RITER = -1
EITER = 0x0
Elt "h" HASH = 0x7003dfea
SV = IV(0x606988) at 0x606998
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1

"It depends". With the following test program,

------
use Benchmark;
use Devel::Peek;

use constant N =>    100;
use constant RANGE =>    1000;

our \$dump;
my @ns = map { int(rand() * RANGE); } 1 .. N;

sub hash_unique
{
my %h;

\$h = undef for @ns;
Dump(\%h) if \$dump;

return scalar(keys(%h));
}

sub array_unique
{
my (@a, \$n);

\$a[\$_] ||= (++\$n, 1) for @ns;
Dump(\@a) if \$dump;

return \$n;
}

{
local \$dump = 1;

print(hash_unique(), "\n");
print(array_unique(), "\n");
}

timethese(-3,
{
hu => \&hash_unique,
au => \&array_unique });
-----

the final size of the hash depended on N and the final size of the array
on RANGE. Further (at least for me), using an array was signficanly
faster if the difference between the orders of magnitude of N and RANGE
was 'one or less' (NB: not the absolute difference) and significantly
slower otherwise.

## Re: How much space is used for hash elements? For array elements?

Thank you.  To make sure I understand: undef is a null pointer, but
other values are pointers to separate values, and therefore undef is
the most storage-efficient value?

I think you're then discuss how sparse it would have to be for a hash
table to be more efficient.  I was unclear.  I am wondering whether
there were any differences in how hash values and array values are
stored -- I don't know why there would be, but I didn't know anything

--
Tim McDaniel, tmcd@panix.com

## Re: How much space is used for hash elements? For array elements?

tmcd@panix.com (Tim McDaniel) writes:

As far as I know this (with 'knowing' including a fairly superficial
look at the perl code, a scalar ('SV') is composed of a head and a body
with the head used to store type-independent metainformation and the
body whatever the value happens to be. And a so-called 'undefined
scalar' has no body, hence, the answer should be 'yes'.

Well, hash values are stored in hashes and array values in
arrays. Unless OPENBSD_SECURUTY_MODEL is enabled when compiling. Then,
it's the other way round to thwart common exploits (plans to fully
randomize this such that anything will randomly be stored in anything
else are well underway).

.
.
.

A hash utilizes an array of linked lists and the size of the array
will be smallest power of two not smaller than the number of keys in the
hash. Additional space will obviously be needed for the key lists
themselves. The size of an array populated by assigning values to random
elements is more difficult to predict. A freshly created array with a
single used element will be large enough to hold the used
element. Whenever the array needs to be grown, the new size will be the
smallest power of two capable of providing an array with the new largest
index. The usuable size of an array appears to be 3 less than this power
of two.

## Re: How much space is used for hash elements? For array elements?

On 22/10/2015 08:43, Tim McDaniel wrote:

https://en.wikipedia.org/wiki/Hash_table