# Finding two strings with the same crc32

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
Just for fun I would like to find two strings that have the same crc32. The
probably of doing this for a set of for example one million strings is more
than 99.9999999999999999999999999999999999999999999999997%.

To calculate the chance of \$n strings leading to all different crc32 is done
by this code: (assuming randomly distributed crc32 values)

use strict;
my \$tmp=1;

my \$n=1000000;
my \$ant=2**32;
for (0..\$n) {
\$tmp*=(\$ant-\$_)/\$ant;
print "\$_\t\$tmp\n" unless (\$_%10000);
}

If I try to find two string that give the same crc32 however I don't
succeed. I use the code below to spot strings that have the same first 28
bits (to save memory I don't store all 32 bits). The code don't find any
however even after checking 6 million strings!

Why?

Sorry for being a bit off topic.

use strict;
use Digest::CRC qw(crc32);
my \$string='aaaaa';
\$|=1;
my \$v;

for (0..11881375) {
unless (\$_%100000) {
print "\$_\t",length(\$v),"\n";
}

my \$tmp=crc32(\$string)>>4;
if (vec(\$v,\$tmp,1)) {
print "\$string\n";
}
vec(\$v,\$tmp,1)=1;
\$string++;
}
/jN

## Re: Finding two strings with the same crc32

Jonas Nilsson wrote:

> If I try to find two string that give the same crc32 however I don't
> succeed. I use the code below to spot strings that have the same first 28
> bits (to save memory I don't store all 32 bits). The code don't find any
> however even after checking 6 million strings!
>
> Why?

It's possible to find crc32 collisions without too much effort.  Since
there are 2**32 possible CRC values, on average you'll need to search
only O(2**16) strings to find a collision.

I wrote a tiny Perl program that found the following collision in 91903
trails:

\$string1 = "aaaaa0.462031558722291"
\$string2 = "aaaaa0.0585754039730588"

These strings share a common CRC value of 1938556049.

If you're interested, here's the Perl program I used:

use strict;
use warnings;
use Digest::CRC qw(crc32);

my \$c;
my \$n;
my \$r;
my %h;
srand;
for (\$n = 0;; \$n++) {
\$r = "aaaaa" . rand();
\$c = crc32(\$r);
if (\$h) {
print "collision: \$h and \$r\n";
print "\$n trials\n";
last;
}
\$h = \$r;
}

Regards, Mark

## Re: Finding two strings with the same crc32

> Jonas Nilsson wrote:
>
> > If I try to find two string that give the same crc32 however I don't
> > succeed. I use the code below to spot strings that have the same first
28
> > bits (to save memory I don't store all 32 bits). The code don't find any
> > however even after checking 6 million strings!
> >
> > Why?
>
>
> It's possible to find crc32 collisions without too much effort.  Since
> there are 2**32 possible CRC values, on average you'll need to search
> only O(2**16) strings to find a collision.
>
> I wrote a tiny Perl program that found the following collision in 91903
> trails:
>
> \$string1 = "aaaaa0.462031558722291"
> \$string2 = "aaaaa0.0585754039730588"
>

By random searching instead of systematic I quickly found a lot of matches
(with crc32 below):
/jN

oxueekz
pyqptgs
1122772949

nktteme
qjpatal
1867444077

nuktjcj
bimxkml
2973957111

atimtna
nfxqcvx
4014160497

tgnvsia
kfjcbeh
550598113

kmswcnn
hcdguxq
4090013392

vswkuqk
yafwbir
2261340929

yezrovl
vwknxnu
2954574600

cujnfpg
phhwvrh
3289759799

xucsdhf
gtgfudo
4164393361

gqplbmq
hcapuuh
656642059

igeougy
ytpfssi
4118505601

driaytu
hnomxzs
2212230278

iketcbk
jerdutt
2479104683

rpppuko
mqtedgf
1986672624

kvbesjh
twfpbfa
2892019439

evwzaos
zwsopcz
3029719102

ugtegak
jfppvmb
3715719613

iknybbk
jeyittt
398238032

itqfybq
jzfvotn
3806847855

ywouxys
ukiyywu
3626703900

zqlzlzj
vmjvmtl
1949335477

aitjoui
qzaciay
3794947975

vxhmxnl
yjyqovu
1882453027

zpshpoa
ubbtgwx
311908063

exhzdhp
jjyfspi
3165437067

xsdmynd
knftilk
2149740729

tohzrgd
xsnvsib
3862892194

sgkldej
lfoyuic
809259157

izzndbw
fhkrszn
2534674167

kjkkgzg
xwirwxh
4108936481

yvwjyuh
jkusiwg
1821358308

fxtdmbs
vkamkvc
4104486534

ftqxzdh
ezfhlrw
2484994613

hvaledh
xetecpx
554199908

yeissqw
jxkjcsx
4294947840

sdpoqcw
cwefwwg
2627502466

nepvour
rjcshod
110262302

cvivwhc
pkkogjl
136192574

bygmhsd
qdetxqk
1195331540

obwpuxb
smdurbt
2466482257

iwuhnto
eksdozi
2396159347

cjegpcp
ovckqmv
861582261

dpozeev
hlivdkp
430397453

tianznw
gtcwjlx
3110265897

xcsjggh
hpfcasx
36665361

xzxuysm
kgzliqb
287371322

chihdwk
otodeym
1558575868

tnkxzms
koomkaz
3085259689

kceezbm
tbapknd
4198087903

wlwfvnq
kcdcqtg
304773531

oewnsln
cyqbrbh
492142473

tobzbve
knfoszl
2490869493

zxytuar
yvndcwm
2991426191

zbrzphz
1591549120

zgffxfz
yiqvnpe
288303136

twuzdyi
xksvewo
596722119

rfsgnet
bufnhqd
756968307

euuhciw
jgdttqn
3764572437

whprspf
kgcwtjp
3048824742

ymlbinm
flhwxbd
308304037

dupnrrz
tfegtfj
1644000365

axmicjj
reopshe
146039524

djqgnyv
tydnhmf
745237997

nwvmesa
rxehbiw
2678562110

gdsfrzk
hvbzebr
3101837067

zehmtce
edlxeol
3451105439

bzqdqlf
atftgzy
1671489439

silhbdt
3505195570

bzrxqyr
atehgom
329271041

rngqazj
mocdpvc
492217567

gdnftyq
xejseux
4065378867

flodegc
ymkqtkj
352398593

ejidvrz
jxxxajc
810447580

dexsuqr
tvmzseb
3499196161

vyzlbpd
fjoeddt
3537376586

yrhzlkq
unnvmew
4017118138

lxmoaip
syizpey
2579702319

zpcpays
eqgepuz
354399221

azizsuh
rgkccwg
3441578396

qzpwdlz
bgrntnu
1152814553

uwbrthp
zesncpi
3104178606

anhjdzw
rsjstxx
736480241

hhfncss
gzwrtkj
2341875164

jdjqdpe
vkytcjs
2096604753

mceivsy
qlvlqio
2468755540

mykpeip
rxoetey
1960566923

wqjcbhb
hpnvsdk
3406243371

wnflwjz
kauippl
2678321277

jxeiyfg
yegpidh
4185286856

phwijrf
cuupzpi
1692899291

uzhiqpm
fgjparb
803662884

## Re: Finding two strings with the same crc32

Jonas Nilsson wrote:

> By random searching instead of systematic I quickly found a lot of matches
> (with crc32 below):
> /jN
>
> oxueekz
> pyqptgs
> 1122772949

A good illustration of the "birthday paradox".

But, can you come up with an efficient Perl program that finds an input
string whose CRC value is 1756683253?

Hint: I'd recommend writing it in C!  ;)

## Re: Finding two strings with the same crc32

>
> But, can you come up with an efficient Perl program that finds an input
> string whose CRC value is 1756683253?
>
> Hint: I'd recommend writing it in C!  ;)

Shall I write a Perl-program in C?

I'm currently running this:

use strict;
use Digest::CRC qw(crc32 crc16);
my \$string='aaaaaaa';
\$|=1;

while (1) {
if (crc32(\$string)==1756683253) {
print \$string;
exit;
}
\$string++;
}
Which finishes after a maximum time of 44 hours. Just wait and see.
(Or should I start from 'zzzzzzz' working down?)
/jN

## Re: Finding two strings with the same crc32

On Tue, 9 Nov 2004 13:27:43 +0100, "Jonas Nilsson"

>> Hint: I'd recommend writing it in C!  ;)
>
>Shall I write a Perl-program in C?

No. But if "shall" (allegedly wittily) means "can", then the answer
is: yes!

Michele
--
->(map substr
((\$a||=join'',map--\$|x\$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,\$_,
256),7,249);s/[^\w,]/ /g;\$ \=/^J/?\$/:"\r";print,redo}#JAPH,

## Re: Finding two strings with the same crc32

> >
> > But, can you come up with an efficient Perl program that finds an input
> > string whose CRC value is 1756683253?
> >
> > Hint: I'd recommend writing it in C!  ;)
>
> Shall I write a Perl-program in C?
>
> I'm currently running this:
>
<snip code>

I've run through 2.6 billion lowercase letter combination without a hit. Is
it so that using only a subset of bytes (like 'a'-'z' == 97 - 122) I will
only have access to a subset of the CRC32 values? I thougt not. If there
still is no hit after the weekend I will start to think otherwise...

/jN

<Revised code - able to restart from last position>

use strict;
use integer;
use Digest::CRC qw(crc32);
my \$string='a';
\$|=1;
if (open(FILE,"<crc1756683253.log")) {
\$string=<FILE>;
chomp(\$string);
close(FILE);
}

my \$t0=time();
while (1) {
for (1..500000) {
if (crc32(\$string)==1756683253) {
my \$timediff=(time()-\$t0);
print "After \$timediff seconds I found this: \$string\n";
if (open(FILE,">>crc1756683253")) {
print FILE "After \$timediff seconds I found this: \$string\n";
close(FILE);
}
}
\$string++;
}
open(FILE,">crc1756683253.log") or die;
print FILE "\$string\n";
my \$number;
my \$mult=1;
my \$ex=26;
no integer;
for (reverse split //,\$string) {
\$number+=(ord(\$_)-96)*\$mult;
\$mult*=\$ex;
}
\$number--;
print FILE "Now I checked \$number strings!\n";
close FILE;
use integer;
}

## Re: Finding two strings with the same crc32

I've got a hit...

C:\Documents and Settings\jonni>perl -MDigest::CRC
print Digest::CRC::crc32('mddxzib');
^D
1756683253

Excited? Naa...
/jN

> > >
> > > But, can you come up with an efficient Perl program that finds an
input
> > > string whose CRC value is 1756683253?
> > >
> > > Hint: I'd recommend writing it in C!  ;)
> >
> > Shall I write a Perl-program in C?
> >
> > I'm currently running this:
> >
> <snip code>
>
> I've run through 2.6 billion lowercase letter combination without a hit.
Is
> it so that using only a subset of bytes (like 'a'-'z' == 97 - 122) I will
> only have access to a subset of the CRC32 values? I thougt not. If there
> still is no hit after the weekend I will start to think otherwise...
>
> /jN
>
> <Revised code - able to restart from last position>
>
> use strict;
> use integer;
> use Digest::CRC qw(crc32);
> my \$string='a';
> \$|=1;
> if (open(FILE,"<crc1756683253.log")) {
>  \$string=<FILE>;
>  chomp(\$string);
>  close(FILE);
> }
>
> my \$t0=time();
> while (1) {
>  for (1..500000) {
>   if (crc32(\$string)==1756683253) {
>    my \$timediff=(time()-\$t0);
>    print "After \$timediff seconds I found this: \$string\n";
>    if (open(FILE,">>crc1756683253")) {
>     print FILE "After \$timediff seconds I found this: \$string\n";
>     close(FILE);
>    }
>   }
>   \$string++;
>  }
>  open(FILE,">crc1756683253.log") or die;
>  print FILE "\$string\n";
>  my \$number;
>  my \$mult=1;
>  my \$ex=26;
>  no integer;
>  for (reverse split //,\$string) {
>   \$number+=(ord(\$_)-96)*\$mult;
>   \$mult*=\$ex;
>  }
>  \$number--;
>  print FILE "Now I checked \$number strings!\n";
>  close FILE;
>  use integer;
> }
>
>

## Re: Finding two strings with the same crc32

> Just for fun I would like to find two strings that have the same crc32.
The
> probably of doing this for a set of for example one million strings is
more
> than 99.9999999999999999999999999999999999999999999999997%.

I found out that the differences can't be close (within the size of the
checksum that is ).

> If I try to find two string that give the same crc32 however I don't
> succeed. I use the code below to spot strings that have the same first 28
> bits (to save memory I don't store all 32 bits). The code don't find any
> however even after checking 6 million strings!
>

Modifying my program a bit gave me some other matching strings:

examples

Maria has nine red beds.
Steven has fifteen white tables.
Both have the crc32 "248210933"

Joe has fourteen magenta things.
Lars has thirteen black balls.
Both have the crc32 "93832682"

/jN

This is interesting. If I may follow up (though I realize it's getting OT; I

What about MD5? I am of the opinion that it is infeaseable to find two
strings that share the same MD5 digest (without taking a hundred million
(billion? trillion?) years or so to do it). Would anyone in the group
disagree?

On Thu, 11 Nov 2004 00:00:53 -0800,
> This is interesting. If I may follow up (though I realize it's getting OT; I
>
> What about MD5? I am of the opinion that it is infeaseable to find two
> strings that share the same MD5 digest (without taking a hundred million
> (billion? trillion?) years or so to do it). Would anyone in the group
> disagree?

Yes.

http://www.tcs.hut.fi/~mjos/md5 /

--
Sam Holden