#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

Hi,

I want to evaluate the strength of Japanese passwords.

For Western character sets (English, German, French etc.). NIST
published a good document on how to assess the entropy of typical
passwords entered by more or less security-aware users: 'NIST Special
Publication 800-63, Electronic Authentication Guideline'. Depending on
whether it is a "weak" password (lower-case only, high probability to
be found in dictionary) or a "strong" password (mix of upper/lower
case, numeric and special characters, low probability to be found in
dictionary) and on its length, it lets me look up in a table the
effective strength (entropy) of such a password.

I'm wondering now how I could evaluate the entropy of Japanese
Japanese symbols, and whether for instance special characters play the
same role in Japanese as in Western languages.

I'm grateful for any references to resources in this direction,
preferably code samples or results that make it easy for me to code
it.

Thanks,

Michael

## Re: entropy of Japanese passwords

First of all, that complicated character sets make passwords stronger is
a myth. They just make them harder to remember and to type. If your
password is sufficiently random and has a reasonable length, then even
using only lower case letters is perfectly reasonable.

If you have an alphabet of c different characters and choose n of them
randomly, then there are c^n possible passwords. To get the strength in
bits b, solve for b:

2^b = c^n
b = n * log c / log 2

Secure passwords should have a strength of about 80 bits or more, so you
can use the same equation to answer questions like: If I have an
alphabet of c characters and I want 80 bits of strength, what minimum
length n does my password need to have?

c^n >= 2^80
n >= 80 * log 2 / log c

Example: I have 26 characters, so for 80 bits I need at least 18
characters. In many situations, like online authentication, 60 bits may
be enough. In that case you need only 13 characters.

Example: I have 500 characters, so for 80 bits I need 9 characters. For
60 bits I need only 7 characters.

Now just apply this to the size of the Japanese character set.

Greets,
Ertugrul.

## Re: entropy of Japanese passwords

The strength of password is defined as the logarithm of the expected
number of tries an attacker needs to make before guessing it.  This is
obviously a somewhat fuzzy notion, given that the search space is
effectively infinite and that we haven't specified what search
strategy the attacker is employing.

For randomly generated passwords, one can obtain a lower bound on
their strength by assuming that the attacker knows the generation
algorithm and is trying all the passwords it can generate in order
from most to least likely (or in random order, if the probabilities
are equal).  But for user-selected passwords, one can generally only
obtain upper bounds, and even those only by taking a given search
strategy as a baseline and assuming that any competent attacker would
do at least as well.

One can do a little better by considering a whole class (or several
classes) of search strategies, and taking the minimum strength across
all of them: this can at least yield a _conditional_ lower bound,
valid against attackers using that particular class of search
strategies, and if the class(es) of strategies considered are
sufficiently diverse, one could even hope that they might serve as
reasonable estimates of a password's strength against real-life
attackers.

Anyway, so much for theory, now let's get into the practice.  Here,
it's worth looking at the methods employed by existing, published

Obviously, any password found on a list of common passwords has no
more strength against an attacker using the list than the logarithm of
its position on the list (i.e. nearly zero for any moderately sized
list).  Similarly, a password found in a dictionary is no stronger
against an attacker using the dictionary than the logarithm of the
number of words in that dictionary.  I assume you can probably find
some Japanese dictionaries, and possibly also Japanese password lists.

Also, since users are known to frequently make small modifications to
weak passwords in an attempt to strengthen them, attackers are also
likely to try such modifications (and there are programs to do so
automatically).  If you can find a list of common modifications, you
can estimate the strength of such a modified password by adding the
logarithm of the position of the modification on the list to the
strength of the unmodified password.  (A normalization scheme, such as
dropping all non-letters and lowercasing all letters, can be useful
for identifying the base word, but of course needs to be picked with
some care to match the modifications being considered.)

Finally, a third method commonly employed by password-guessing
programs is to generate candidate passwords according to some
statistical model (such as a Markovian n-gram model), based either on
actual observed passwords or simply on general text in the target
language, and trying them out either in order of descending likelihood
or simply at random.  While not as efficient as simple password lists,
such models have the advantage of being able to generate an unlimited
number of candidate password in an order which, if the model is well
built and based on a large and realistic enough data set, may approach
the optimal search order to within a low-order factor.

The nice thing about such models is that it's also generally easy to
calculate the strength of any given password against them.  For
example, a simple n-th order Markovian word generator simply takes as
its input the frequencies of consecutive n-character sequences in a
corpus of text, calculates from them the conditional probabilities of
each character given the n-1 preceding ones, and uses these to
generate random words having the same n-character frequency
distribution.  Conversely, to estimate the strength of a given
logarithms of the conditional probabilities of each character in the
password (plus the probability of stopping at the end) according to
the same model.

So, I would say what you need is a list of Japanese words (or, better
yet, actual passwords chosen by Japanese users) by frequency.  Then
you can calculate the strength of each word in the list from its
position in it, while for the rest you can use the list (weighed by
frequency) to construct, say, a 2- or 3-character Markov model and use
it to estimate the strength of passwords that are not actually on the
list.  If you also know some common modification tricks used for
Japanese passwords, you can also test for those.  Apply a sensible
safety margin, and you should have about as good an estimate of the
strength of a password as can be reasonably obtained.

--
Ilmari Karonen

## Re: entropy of Japanese passwords

I do not have code samples for you. So first of all, yes, because the
japanese character set is larger than the 26 letter western alphabet,
a japanese password as long as an english password is more secure.
However, do not be fooled by the idea that you can just calculate 2^b
=3D s^n where b is the entropy, s is the size of the character set and n
is the number of characters. That equation only works if your password
is a random selection of characters from the character set. (unlikely)
In practice, a 'u' almost always follows a 'q' and consonance and
vowels are more likely than not to alternate. You password is likely
to group letters and numbers separately and a bunch of other such
things. So, if you are going to use something like a passphrase/word
which uses real words what I recommend you do is the following: Take a
very long text (something along the lines of the entire works of
Shakespeare) and apply the following algorithm:
I0 = How many characters can start a string? (In English just about
any, but I assume Japanese is different)
I1 = Given the first character of a string, what is the expected value
of the number of possibilities for the second character?
I2 = Given the first and second character, what is the expected value
of the number of possibilities for the third character?
And so on until you reach enough characters that you are satisfied
Then, to compute the entropy of an n character password solve I0 * I1
* ... * I(n-1) = 2^b for b
Then, please make the results available for the rest of the world
since it should hold for just about anyone...

Also, do not listen to anyone who gives you a magic number such as 80-
bits as being secure. There is no such magic number. What is secure
depends on the level of technology and resources that your opponent
will throw at you. 10 bits will defeat your 9 years old son, but 128
bits is likely to fall to the supercomputers at the NSA, especially if
you need the information secure for 100 years.

## Re: entropy of Japanese passwords

You may just as well reply to my post directly and tell me that I'm
talking nonsense.  Yes, your post implies that in a subtle and very
impolite manner.

It's true that passwords need to be random for the equation to hold, but
well, if they're not random, then there is something wrong anyway.  The
OP talked about "more or less security-aware people", i.e. those, who
don't necessarily follow a 'q' by a 'u'.  Further I made that randomness
before drawing conclusions.

Finally 80 bits will beat almost any realistic attack.  Given one
million machines trying one billion passwords a second each, you still
need in average 19.15 years to break an 80 bits password.  This is well
beyond the abilities of any modern agency/institute/company.  If you
still want more, pick 90 bits (19614 years) or even 100 bits (20 million
years).  Until we arrive at quantum computing, there is nothing wrong
with that.

Greets,
Ertugrul.

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://blog.ertes.de /

## Re: entropy of Japanese passwords

I can't speak for Armence, but _I_ don't think you were talking
nonsense; however, I do think you answered a somewhat different
question than what the OP was asking.

(randomly generated) password need to be to be secure?"  At least my
interpretation of the original question, however, was closer to "How
can I tell if a password picked by someone else is secure or not?"

(The answer, of course, is that you can't tell for certain.  However,
by imagining that you're an attacker trying to guess the password,
using the same ad hoc tricks a real attacker would be likely to use,
and estimating how long it would take you, you can at least give a
rough estimate.)

If your users are security-aware enough to generate and remember
passwords whose characters are independently and uniformly
distributed, then you can probably trust them to make them long enough
too.  Sure, it can do no harm to check that the password isn't so
short as to be insecure even if it were so generated, but I would say
that in reality a password is overwhelmingly more likely to fail due
to low entropy per character than due to not enough characters.

(Humans are notoriously bad at generating unbiased and uncorrelated
randomness, anyway; you really need computer or mechanical assistance
for that.  Also, I know I personally can't easily memorize random
letter sequences long enough to be secure as passwords; I can and do
use random *word* sequences as passphrases, but those still have only
about two bits of entropy per letter.)

This all depends on how long it takes to test one password, anyway.
On a modern desktop PC, using proper key strengthening can easily slow
down password guessing attacks by a factor of up to 2^20.

--
Ilmari Karonen

## Re: entropy of Japanese passwords

Why dont servers automatically impose delays between password
attempts? It takes at least 2-3 s for a user to receive the notification
that his password is wrong, read it, click on the right input field and
retype the password. This doesn't work for cracking a protected file but
it does for every client/server communication.

My AS400 server used to block an account after 3 bad tries and
only the admin could reactivate the account. A Range Rover we had would
have you wait 5s for a 2nd attempt, 1mn for the 3rd, 5 for the 4th and
30 for every other tries (dont know if it couldn't be any longer).

Delay between attempts is the best way I know of seriously

--
Benoît

Avec des fumeurs c'est difficile de s'arrêter. Avec des branleurs,
là, par contre, c'est difficile de continuer.

## Re: entropy of Japanese passwords

question of the OP, not discredit you. Actually, if you look at my
post, a comparatively small portion of it has to do with you. I did
mention some of what you said because I felt that those were
inaccurate or at the very least misleading though far from
nonsensical. Of course the character set does not inherently make the
password more or less secure. But that is missing the important fact
that ceterus paribus, a larger character set displays more entropy per
character and the implications for security. The rest of your post
makes it obvious that you know this fact, but the beginning of your
post would have confused someone who has little/no knowledge of
information theory.

Your assumption of randomness was indeed mentioned. Without contesting
that, I would like to submit the proposition that most people (even
those that are security conscious) will not have truly random
passwords, and will most likely choose a word or phrase simply because
it is easier to remember. In other words, you are providing the most
optimistic evaluation of the entropy of the password possible. That is
perhaps not particularly useful to a real world application.

Yes. Under the attack model you are proposing, you may be right.
However, the attack model you are presenting is likely to be wrong. If
the passwords are stored in a hash file and there are many users, then
a birthday attack may succeed. If the password is used to encrypt
files on a file system, it is likely to be subject to known plaintext
or even chosen plaintext attacks. If the password is used to generate
an RSA key, as an example, 80 bits can be factored within months. The
bottom line is that there is no magic number that makes you secure.
You have to look at the specifics and derive an attack model.

## Re: entropy of Japanese passwords

I mostly agree with what you've posted in this thread, but I'll have
to take issue with this paragraph.  The kinds of attacks you seem to
describe are only possible if the design of the system is deeply
flawed; and in that case, it is the flawed design that enables the
attack, not the length (or entropy) of the password.

In particular, hashed passwords should always be salted, and the salt
should be long enough to make two users getting the same salt
unlikely.  (Also, the user ID can be combined with the salt, thereby
guaranteeing uniqueness of the salt within a given system.)

Also, a password should never, ever be used as a raw cryptographic
key: it should always be hashed and expanded to the desired length.

As it happens, there's an existing standard that applies well to both
of these situations.  It's called PBKDF2.

--
Ilmari Karonen

## Re: entropy of Japanese passwords

Armence schrieb:

Well yes, it's optimistic in that it states the randomness requirement,
which is probably not fulfilled in the real world.  That makes it exact
under optimistic assumptions, not necessarily 'optimistic'.

If the randomness requirement is not fulfilled to the extent of making
the calculations as simple as solving small equations, then the security
of your system gets unpredictable.  This is something to avoid.  That
means if you have to rely on secure passwords and your users aren't
generating proper ones, do it yourself.

BTW, I apologize for my rough tone.

Not if it's done properly.  That's why passwords are hashed with a salt.

I'm assuming that such attacks are possible.  My calculations are done
strictly under that assumption (disregarding weaknesses in the cipher).

Now that is really far fetched.  If a password is used as entropy for an
RSA key, then there is something wrong with the implementation.  An RSA
key must be random.

All in all, I believe that the point, where 80 bits will become less
than 'generally secure for almost all people' is where quantum computing
will be feasible.

Yes, I fully agree with that one.

Greets,
Ertugrul.

## Re: entropy of Japanese passwords

The

You are right, you are exact under optimistic assumptions and not just
optimistic.
I think that while you are right that if your users can't be trusted
to generate good password you should generate them, that is not always
possible. Going to the boss and telling him he has to memorize a
random character sequence in order to use his computer is unlikely to
get you approval for the scheme. Also, random sequences might end up
being weaker as people unable to memorize them write them on sticky-
notes and post them at their desk.

No worries, I probably could have been a bit more courteous.

Maybe I'm a pessimist, but I believe that deeply flawed
implementations are the rule and not the exception.
Also, I think all ciphers are most likely inherently flawed in some as
of yet undiscovered way and that if you need your data safe for a long
time, you should assume that significant cryptanalytic breakthroughs
will reduce the safety of your data considerably. That's why I would
say 80 bits could be far from sufficient depending on your
application.

I guess that was most of my point when it comes to the whole 80 bit
thing.

Have a nice day...