# Why unhashing is not possible?

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
I've always led the belief that if something can be created/built,
then it can be "uncreated/unbuilt".  In technology this is sometimes
referred to as reverse-engineering. I understand too how hashing can
be used within programs and fast database lookups but I failed
miserably discussing the subject of hashes with someone who could not
understand when I told them that you cannot reverse the process... I
know you can crudely hack it (using John the Ripper for example) but
you cannot "uncompress it" so to speak.

So... my question here to you good people is... why can't a hash be...
well... unhashed?

Surely if the hash is unique, we know the process that got us the hash
we have only to reverse the process... we can't... so... but why not?

## Re: Why unhashing is not possible?

Randell_D wrote:

Because a hash is lossy, and some are even cryptographically secure?

A hash is not unique.

## Re: Why unhashing is not possible?

On Tue, 25 Dec 2007 08:04:44 -0800 (PST), Randell_D wrote:

It is not an encrypted content but the result of a mathematical computation.

This example is nothing like a hash, but think about taking 11, 6 digit
numbers, multiply each with each other, keep only the last 5 digits of
the result.

That is the example hash value. Take that number, and recreate the
original 11 numbers.

## Re: Why unhashing is not possible?

In article

How could the hash possibly be guaranteed to be unique?  Input texts are
usually many kilobytes or megabytes long, while the hash code is
typicaly in the 10's of bytes.  Consider a really simple hash function
for a decimal number: take the last last 20 digits, and then every other
digit of this.  If you start with a 100-digit number, 10^90 inputs map
to the same hash.

What the hash code designers try to do is make it very hard to find
another input that will hash to a given code.  Even harder is finding
another input that's MEANINGFUL.  So if someone is distributing a
computer program, and sends the hash code separately, it would be
virtually impossible for someone to modify the program in such a way
that it will still have the same hash and also run.  Or if the original
input were natural language text, even if someone could find a
replacement with the same hash, it will almost certainly be total
gibberish.

Of course, my toy example above doesn't have this attribute.  But hash
functions used in crytography do.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

## Re: Why unhashing is not possible?

Barry Margolin wrote:

For a limited set of inputs, this is very easy.

This is only true for cryptographic hashes.

## Re: Why unhashing is not possible?

Yes. then it is not a hash. It may be an encryption, or a translation.

That was what he said, in the parts you erased.

## Re: Why unhashing is not possible?

http://en.wikipedia.org/wiki/Perfect_hash_function

http://burtleburtle.net/bob/hash/perfect.html
"Minimal perfect hashing"

I've never seen Perfect Hashing referred to as an encryption or
translation, only ever as a "hash function".

## Re: Why unhashing is not possible?

roberson@hushmail.com (Walter Roberson) wrote:

These are not the kind of hashing that the OP is talking about.  He's
non-reversible.  In this case, an important reason for the
non-reversibility is that they're many-to-one.

The word "hash" is used in a number of different contexts in computer
science, you have to be careful not to confuse them.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

## Re: Why unhashing is not possible?

Barry Margolin wrote:

cryptographic hashes.

Which is true for only the set of inputs, not arbitrary subsets of inputs.

Nonsense, it's always the same: A hash is a function A^* -> B^m for fixed
alphabets A and B and a fixed integer m.

## Re: Why unhashing is not possible?

But the types of hashes and their properties are very dependent on WHY
you're hashing.  The algorithms you use to implement a symbol table
(which is where perfect hashing becomes useful) are completely different
from those used for cryptography (where it's important that it be
difficult to generate the same hash) or password encryption (where
irreversibility is critical).

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

## Re: Why unhashing is not possible?

Types of hashes and their properties exist apart from why
you are hashing. Why you are hashing governs your choice of which
type of hash (and thus which properties) you would rationally
select. Your choice of whether to use an axe or a shovel to dig
a hole in the ground does not change the properties of axes and
shovels, but sometimes the axe would be the better choice (e.g.,
ice axes in ice, or digging axes when dealing with a forest fire);
sometimes a shovel would be better.

The point is not relevant to Bill Unruh's statement that a "hash"
which is unique is "not a hash", which is what this subthread is about.
Your paragraph, quoted above, implicity agrees that "perfect hashing"
is a type of hashing; as perfect hashing hashes to a unique result
(over input it was designed for), Bill's statement appears to
be incorrect in context (which the original poster did not restrict
to cryptographic hashes.)

Sebastian's definition of what a "hash" is appears to me to be valid.
It's just that most things which are technically hashes are not
very useful for specific purposes (just like most things that
are "compression" functions aren't very useful for much.)

## Re: Why unhashing is not possible?

roberson@hushmail.com (Walter Roberson) writes:

And sometimes I can use a shovel as an axe and an axe as a shovel. But this
does not mean that we should take the meaning of both to be so broad that
there is no distinction between them ( eg, an axe is something made of
metal with a handle-- purpose comes into the definition of axe).

I will admit that my statement was technically wrong. However, in the
context that the OP seemed to be asking his question, it was I believe not.
The OP wanted to have some explanation for a friend as to why hashes were
hard to invert, since you knew the process which produced the hash. Now,
the typical  time that hashes are hard to invert ( find a preimage ) is if
they are cryptographic hashes. Clearly the hash which takes the first bit
of the input as the output is NOT hard to invert, and thus the OPs question
makes no sense for that hash. However, I will admit that it is a hash.
The OP was clearly confused over the question he was asking, and as such
one had to try to understand what he really meant. I may have misunderstood
the context, but I do not think so. I do believe that he WAS asking about
cryptographic hashes. That is the context in which I answered the question.
Also, a hash which is unique and invertible to me is an encryption, but I
agree that in some contexts it could be a hash and in some an enctyption. I
for example would not call the messages sent using PGP a hash, even though
they are a mapping from the space of input function to the space of output
functions of length m where m=10^80. On the other hand, I could imagine
using the encryption of a 10 character input as a 10 character hash in some
contexts.

Thus this message, under the definition of any function from input to output
being a hash, could be considered a hash, but anyone who used that
definition would in most context be considered an idiot.

No, it is overbroad. It certainly captures an aspect of a hash, but is so
broad that it does not distinguish a has from anything else. Ie, it is so
broad that the word loses all meaning. So it is true, but meaningless.

And use/purpose should be part of the meaning of hash to make the meaning
useful. Words should be specific, so that when they are used, the person
listening actually gets some information transfered to them.

## Re: Why unhashing is not possible?

"Perfect hashing" usually is only part of a hashing implementation.
First step usually is trunctating or simple hashing. Your source
<http://en.wikipedia.org/wiki/Perfect_hash_function describes it as "A
Perfect hash function of a set S is a hash function which maps different
/keys/ (elements) in S to different numbers" for that reason.

Or, from <http://en.wikipedia.org/wiki/Hash_function
| Typical hash functions have an infinite domain, such as byte strings
| of arbitrary length, and a finite range, such as bit sequences of some
| fixed length.

And, from the same source:
| Hash functions that are one-to-one are also called permutations.

Maybe, we should call a perfect hash function a permutation, and not a
typical hash function.

Yours,
VB.
--
The file name of an indirect node file is the string "iNode" immediately
followed by the link reference converted to decimal text, with no leading
zeroes. For example, an indirect node file with link reference 123 would
have the name "iNode123". - HFS Plus Volume Format, MacOS X

## Re: Why unhashing is not possible?

Perfect hash functions are seldom one-to-one. A one-to-one function
has the same domain and range (i.e., the numeric range of the
inputs is the same as the numeric range of the outputs), whereas
perfect hash functions typically take multiple octets of input
(strings) and produce an integer output representable in a small
number of octets (log 2 of the number of inputs, not log 2 of the
size needed to represent the inputs.)

## Re: Why unhashing is not possible?

Unruh wrote:

A hash is a function of A^* -> B^m for a fixed value of m. Nothing else.

A very interesting example would be: Take the first 12 bytes of the input
(pad with zero if necessary), concatenate it with the SHA1 checksum of the
input. This hash has a length of 256 bit, is cryptographically
collision-resistant, yet leaks information and for any message up to 12 byte
is trivially invertible.

You were the first to talk about cryptographic hashes. The OP just talked
about hashes, and hashes serve a wide variety of purposes other than
cryptography.

## Re: Why unhashing is not possible?

It can't, or it will not be a hash function any more.

Yours,
VB.
--
The file name of an indirect node file is the string "iNode" immediately
followed by the link reference converted to decimal text, with no leading
zeroes. For example, an indirect node file with link reference 123 would
have the name "iNode123". - HFS Plus Volume Format, MacOS X

## Re: Why unhashing is not possible?

No, a hash is NOT unique. Many many many many inputs give the same hash.
That is the essense of a hash.
And at each step in the hash, information is thrown away. Ie, in running
the hash in reverse one would have to keep guessing at the thrown away
material.

Secondly to find even one of those may be very difficult. Yes, If I try
2^128 inputs there is good possiblity I will find the one giving me the
hash I have, but 2^128 is a very large number and I cannot try that many.

For an encryption, which is one to one, there are 2^60 or 2^128 ( depending
on the key length) by which that output could have been generated from the
input ( the process depends on the key). Thus I do NOT know how the output
was generated and thus cannot reverse it.

## Re: Why unhashing is not possible?

http://en.wikipedia.org/wiki/Hash_function

A hash function [1] is a reproducible method of turning some kind
of data into a (relatively) small number that may serve as a
digital "fingerprint" of the data.

There is no requirement there that the output cannot be unique
if the input is limited.

For example, if you were to write a compiler or interpreter for
a language with a fixed set of keywords, then the strings that
were the keywords could be hashed to produce a table offset
used for dealing with that particular keyword -- instead of doing
a string search or B-Tree or 2-3 Tree traversal to determine
the entry.

## Re: Why unhashing is not possible?

Unruh wrote:

Unless you reach its maxixum output length (which is typically very short in
comparison to the input), any good cryptographic hash does its best in
preserving as much information as possible.

Are you implying a cryptographic hash here? The OP didn't. Sometimes we just
want hashes that are only good at not randomly leading to collision, and CRC
is a perfect hash in this sense - yet is cryptographically insecure.

Nonsense again. You just have to know the key and for a small set of inputs
you can perfectly reverse it.

## Re: Why unhashing is not possible?

No, it does its best at not preserving any information as possible if it is
a cryptographic hash. You want it to a reproducible mapping from stings to
finite length random numbers.

Yes, he did. He implied that he was talking about a situation in which is
was very hard or impossible to reverse the hash. Or did you not happen to