# The "Blum Mental Hash" Is A Lousy Idea

I’ve seen the “Blum Mental Hash” make the rounds several times over the past decade. Although it’s described as a “secure hash function”, it’s really more like a (weird variant of a) digital signature: given a secret key and a character string of length `n`

, it produces a digit string of length `n`

. The claim is that you can compute such output in your head (assuming you have a particularly well-trained head—merely remembering the very large key is quite a challenge). The primary application envisioned is that you can coin a distinct password for each web site by hashing/signing the name or url of that site.

Every time I’ve seen this I’ve been highly skeptical, but didn’t bother digging into it.

John D Cook recently credulously repeated the proposal, and for once I actually took the time to think it through.

In short, I’m astonished anyone ever seriously proposed this scheme and its application, let alone a Turing award winner. More amazing is that it is *still* making the rounds, despite plenty of people pointing out that it’s not “secure” in any reasonable sense. It took me only a few minutes to write some code to crack it based on a few *extremely* straightforward chosen cleartext attacks. Upon further analysis, you can exploit the information leaked to reconstruct an entire key from just a few dozen encoded characters!

## The key and algorithm

The key consists of two functions: `f(Char) -> Digit`

maps a character to a digit, and `g(Digit) -> Digit`

maps a digit to another digit. The original proposal suggests that `g`

be a permutation (i.e. a bijection), but that’s not terribly important. Based on this key, the algorithm for hashing/signing a character string `s`

into an equal-length digit string `t`

is simple:

- The first digit of
`t`

is`g((f(a) + f(b)) % 10)`

where`a`

is the first character of`s`

and`b`

is the last character of`s`

- Every subsequent character of
`t`

is given by`g((d + f(c)) % 10)`

where`d`

is the previous digit in`t`

and`c`

is the character in the corresponding position of`s`

.

A simple Python implementation:

```
def blum(f, g, s):
out = [0] * len(s)
out[0] = g[(f[s[0]] + f[s[-1]]) % 10]
for i in range(1, len(s)):
out[i] = g[(out[i-1] + f[s[i]]) % 10]
return out
```

## Deriving `g`

from partial `f`

If you happen to know `f(c)`

for some character `c`

appearing in an input string, and the corresponding digit in the output string is `d`

preceded by some other digit `e`

, then you can derive `g((e + f(c)) % 10) == d`

. The crucial point here is that you can derive a *distinct* part of `g`

for *every* distinct value of `e`

: if `c`

happens to be repeated many times in the cleartext, you may be able to derive *all* of `g`

from the single value `f(c)`

. There are only ten possible values of `f(c)`

, so there are only ten possible values for whatever part of `g`

you can derive.

## Deriving `f`

from (partial) `g`

Suppose that the output string contains the digit sequence `ab`

, that you have used the above technique to determine that `g(d) == b`

, and that the character corresponding to this `b`

in the input string is `c`

. Then you can derive that `f(c) == (d - a) % 10`

. You can thus derive a *distinct* part of `f`

for every distinct `c`

whose image `b`

you have derived in (the inverse of) `g`

. As above, there are only ten possibilities for `d`

, so there are only ten possible values for whatever part of `f`

you can derive.

## Deriving `f`

when first and last characters match

If you have an input string whose first and last characters are the same character `c`

, the first character of the output for that string is some digit `d`

, and you have used the above techniques to determine the digit `e`

that `g`

maps to `d`

, then you know that `(2 * f(c)) % 10 == e`

. There are only two possible values for `f(c)`

that meet this condition. In practice, the prior attacks are more than enough to derive the entire key in most cases; it’s merely worth noting that there is information leakage all over the place in this “secure hash”.

## A naive cracker

It’s clear from the above that significant information about the key is leaked from *any* input/output pair. If you can influence the input at all, you can easily manipulate it to produce *lots* of information. And if you can just craft the input from scratch, extracting the entire key becomes trivial. Below is a naive cracker that uses the first two of the above techniques to derive a key based on completely random cleartext.

```
def extendg(f, g, cleartext, hashed):
g = { k: v for k, v in g.items() }
for c, fc in f.items():
for i in range(1, len(cleartext)):
if cleartext[i] == c:
e, d = hashed[i - 1], hashed[i]
g[(e + fc) % 10] = d
return g
def extendf(f, g, cleartext, hashed):
f = { k: v for k, v in f.items() }
for d, b in g.items():
for i in range(1, len(hashed)):
if hashed[i] == b:
a, c = hashed[i - 1], cleartext[i]
f[c] = (d - a) % 10
return f
def derive(ps, cleartext, hashed):
out = []
for f, g in ps:
while True:
g = extendg(f, g, cleartext, hashed)
nf = extendf(f, g, cleartext, hashed)
if f == nf: break
f = nf
if len(f) < 26 or len(g) < 10 or blum(f, g, cleartext) == hashed:
out.append((f, g))
return out
def crack(kf, kg, chunk_len=50):
letters = 'abcdefghijklmnopqrstuvwxyz'
ct = ''.join(random.choice(letters) for _ in range(chunk_len))
c = collections.Counter(ct).most_common(1)[0][0]
possibilities = [ ({ c: i }, {}) for i in range(10) ]
while (len(possibilities) != 1 or
len(possibilities[0][0]) != 26 or
len(possibilities[0][1]) != 10):
possibilities = derive(possibilities, ct, blum(kf, kg, ct))
ct = ''.join(random.choice(letters) for _ in range(chunk_len))
return possibilities[0]
```

On average, the full key can be extracted from fewer than 150 random characters encoded! That is **extraordinarily** poor “cryptography”! (A full implementation used to compute this statistic is available here.)

## What should non-cryptographers know about cryptography?

I have no expertise in cryptography, but I have some experience working with “formal systems” in general. That experience begins with one very important lesson: **Don’t ask “Why shouldn’t system X have property Y?”; ask “Why SHOULD system X have property Y?”**

The former question is a favorite tactic of obstructionist time-wasters. “I’m going to believe any arbitrary crap I like until you produce a rigorous proof that I’m wrong!” Rigorous proofs take orders of magnitude more work than just making shit up.

So I’ve got to wonder: why on earth would anyone think the “Blum Mental Hash” was “secure”? I’ve found absolutely no explanation of any theory. In cryptography in particular, *even if* you have a solid theory that explains how you’re propagating lots of entropy and no information is being leaked etc etc, and *even if* your formalization and proof of that theory is rock solid, there is *still* a high chance that there are factors beyond your formalization or details of the implementation that provide attack vectors. Anyone who describes *any* algorithm as “secure” without deep vetting by a *very* large community of experts immediately disqualifies themself not only from “expertise” in cryptography, but also from non-specialist competence in employing it.

When you roll your own cryptography, you look like an idiot.