**posted on**

- Jean-Fran?ois Lacrampe

May 2, 2005, 6:55 pm

Hello,

I've got two random number/statistics questions I'd like you to

review. My first question is not directly related to PHP, but will be

implemented in PHP, as explained in my second question, so let's go:

I want to generate 10000 strings of x characters, with one chance (or

less) on a million that you can guess them by just randomly typing

them. So I need to know what is the value of x.

I wrote the following equation :

36^x/10000 = 1000000

*<=> 36^x = 10000 * 1000000*

*<=> 36^x = 1010*

*<=> x = ln(1010)/ln(36)*

*<=> x = 23.025850929940456840179914546844/3.5835189384561100016249547167614*

*<=> x = 6.4254860446923437997173954827712*

So, a 7-characters string would be good enough.

So my first question is: is my reasoning OK? Knowing my math

abilities, I doubt it very much! ;)

The second question I have is related to PHP's rand() function. I've

read many times that rand() is not random enough, especially when

generating long lists of this kind. Would you use something that's

more powerful than rand(), are there stronger random functions, within

PEAR for instance, or anything?

Thanks,

JFLac

## Re: Random strings of character and some stats

On 2 May 2005 13:55:18 -0700, jflacrampe@gmail.com (Jean-Fran?ois Lacrampe)
wrote:

wrote:

I'm not 100% clear on the "them" in the sentence; are you saying you want less

than 1/1000000 chance of guessing ONE of the 10000 strings, or 1/1000000 chance

of guessing the ENTIRE SET of 10000 strings?

Depends on the interpretation above. Not sure I get how the 10000 is involved

here though.

10000 * 1000000 = 1010 ? Is that supposed to be 10^10 ?

Apparently so :-)

If you want at worst 1/1000000 chance of guessing any string, isn't the number

of strings irrelevant if they're random?

i.e. it's just

36^x > 1000000

*=> x > ln(1000000)/ln(36)*

*=> x > 3.855*

So minimum number of chars = 4.

(36^3 = 46656, 36^4 = 1679616)

The odds of guessing ALL the strings surely head well out of the 1 in 1000000

range for 10000 strings very quickly...

mt_rand() uses the Mersenne Twister pseudorandom algorithm, which is typically

better (and as a bonus it's faster too).

If you want to get really serious you'll need to base it on some sort of truly

physical phenomenon, e.g. with RNG hardware, which is often based on random

thermal fluctuations.

--

## Re: Random strings of character and some stats

Andy Hassall wrote:

want less

1/1000000 chance

I meant the odds of guessing any of the 10000 strings, of course! :-)

The odds of guessing the entire set must be really, really low!

Well, I wrote the equation in another editor who was sooo happy to show

me it was able to display the 10^10 graphically. Too bad it forgot to

copy/paste it back to me with the circumflex.

the number

Well, keeping in mind my very 'intuitive' and weak knowledge of math,

I'd guess that the more strings you put in the list, the more chances

you have to guess one (any) of them. If for instance I had a list long

enough to contain all the possible combinations, the odds would be 1/1,

right? If you divide the list by two, the odds are 1/2. And so on.

So the number of items in the list seems to matter: that's how I came

with the 10000 * 1000000 thing (by doing lots of intermediate and

stupid steps on a sheet of paper).

I'm not sure at all that I put the 10000 where I should have in the

equation, though, hence my initial question.

Now, I'm talking about things I don't understand (math) in a language

that isn't my native language and I reckon that I'm a bit awkward at

explaining my thoughts. :-)

typically

sort of truly

random

I could also use a webcam on a lava lamp and produce my results using

the webcam info, but I guess I don't need that randomness. I just

wanted to know what was my best bet with what PHP can give me, with

minimal hassle. ;-)

Thanks for your answers,

JFLac

## Re: Random strings of character and some stats

wrote:

Ah, yes of course. OK, I agree with your maths, looks right.

--

## Re: Random strings of character and some stats

Jean-Francois Lacrampe wrote:

OK, one in a million chance of successfully guessing 10,000 strings

equals 0.9986 chance of successfully guessing a single string:

0.9986 ^ 10000 = 8.23412E-07 ~ 1E-06 (one in a million)

In other words, even if you are virtually certain to get a single

string right, it's still virtually impossible to get 10,000 of them

right. So a one-character string will suffice. In fact, even a

one-bit value (0 or 1) would be an overkill. :)

within

Check out mt_rand():

http://www.php.net/mt_rand

Cheers,

NC

## Re: Random strings of character and some stats

NC wrote:

(or

As I said in another branch of the thread, I wasn't clear enough: I

meant 'one chance (or less) on a million that you can guess

___any___of

them'.

Anyway... Here's the code I wrote to generate my 10000 strings, just in

case it's useful to somebody browsing the archives, someday. The

random function is pretty much the same as the one you see on every php

tutorial, but it uses mt_rand() instead of rand() as many of you have

advised me. and I wrote a (very inefficient) dupe checker.

Optimizations and ideas are welcome, but that's just for the fun of it:

I'll generate these strings just once, so it doesn't matter if it takes

one full minute, it will only be ran once. :-)

<?php

set

___time___limit (600); // We need this because of the time-consuming

// in_array()

$values = array();

function random_string() {

$allowed_chars = "0123456789AZERTYUIOPQSDFGHJKLMWXCVBN";

mt_srand((double)microtime()*1000000);

$string = 'A'; // I put a control char at the start of my

// password just in case I want to generate

// a second, third,... series in the future.

for($i = 0 ;$i <= 6; $i++) {

$position = mt_rand()%36;

$temp = substr($allowed_chars, $position, 1);

$string .= $temp;

}

return $string;

}

// The odds are low, but it's possible that the same string

// is generated twice. So I check if each new string found

// isn't in the previous string found in the array, but it

// slows down the script, which is not a problem in my

// case, but I welcome optimizations ideas.

// If a dupe is detected, I just decrement $i, which forces

// the loop to loop once more for this value of $i.

for ($i = 1 ; $i <= 10000; $i++) {

if (!in_array($string,$values)) {

$values[$i] = random_string();

} else {

$i--;

}

}

echo '<pre>';

print_r ($values);

echo '</pre>';

?>

JFLac

