The “Blum Mental Hash” Is A Lousy Idea

I’ve seen the “Blum Mental Hash” make the rounds several times over the past decade. Although it’s described as a “secure hash function”, it’s really more like a (weird variant of a) digital signature: given a secret key and a character string of length n, it produces a digit string of length n. The claim is that you can compute such output in your head (assuming you have a particularly well-trained head—merely remembering the very large key is quite a challenge). The primary application envisioned is that you can coin a distinct password for each web site by hashing/signing the name or url of that site.

Every time I’ve seen this I’ve been highly skeptical, but didn’t bother digging into it.

John D Cook recently credulously repeated the proposal, and for once I actually took the time to think it through.

In short, I’m astonished anyone ever seriously proposed this scheme and its application, let alone a Turing award winner. More amazing is that it is still making the rounds, despite plenty of people pointing out that it’s not “secure” in any reasonable sense. It took me only a few minutes to write some code to crack it based on a few extremely straightforward chosen cleartext attacks. Upon further analysis, you can exploit the information leaked to reconstruct an entire key from just a few dozen encoded characters!

The key and algorithm

The key consists of two functions: f(Char) -> Digit maps a character to a digit, and g(Digit) -> Digit maps a digit to another digit. The original proposal suggests that g be a permutation (i.e. a bijection), but that’s not terribly important. Based on this key, the algorithm for hashing/signing a character string s into an equal-length digit string t is simple:

The first digit of t is g((f(a) + f(b)) % 10) where a is the first character of s and b is the last character of s
Every subsequent character of t is given by g((d + f(c)) % 10) where d is the previous digit in t and c is the character in the corresponding position of s.

A simple Python implementation:

def blum(f, g, s):
  out = [0] * len(s)
  out[0] = g[(f[s[0]] + f[s[-1]]) % 10]
  for i in range(1, len(s)):
    out[i] = g[(out[i-1] + f[s[i]]) % 10]
  return out

Deriving `g` from partial `f`

If you happen to know f(c) for some character c appearing in an input string, and the corresponding digit in the output string is d preceded by some other digit e, then you can derive g((e + f(c)) % 10) == d. The crucial point here is that you can derive a distinct part of g for every distinct value of e: if c happens to be repeated many times in the cleartext, you may be able to derive all of g from the single value f(c). There are only ten possible values of f(c), so there are only ten possible values for whatever part of g you can derive.

Deriving `f` from (partial) `g`

Suppose that the output string contains the digit sequence ab, that you have used the above technique to determine that g(d) == b, and that the character corresponding to this b in the input string is c. Then you can derive that f(c) == (d - a) % 10. You can thus derive a distinct part of f for every distinct c whose image b you have derived in (the inverse of) g. As above, there are only ten possibilities for d, so there are only ten possible values for whatever part of f you can derive.

Deriving `f` when first and last characters match

If you have an input string whose first and last characters are the same character c, the first character of the output for that string is some digit d, and you have used the above techniques to determine the digit e that g maps to d, then you know that (2 * f(c)) % 10 == e. There are only two possible values for f(c) that meet this condition. In practice, the prior attacks are more than enough to derive the entire key in most cases; it’s merely worth noting that there is information leakage all over the place in this “secure hash”.

A naive cracker

It’s clear from the above that significant information about the key is leaked from any input/output pair. If you can influence the input at all, you can easily manipulate it to produce lots of information. And if you can just craft the input from scratch, extracting the entire key becomes trivial. Below is a naive cracker that uses the first two of the above techniques to derive a key based on completely random cleartext.

def extendg(f, g, cleartext, hashed):
  g = { k: v for k, v in g.items() }
  for c, fc in f.items():
    for i in range(1, len(cleartext)):
      if cleartext[i] == c:
        e, d = hashed[i - 1], hashed[i]
        g[(e + fc) % 10] = d
  return g

def extendf(f, g, cleartext, hashed):
  f = { k: v for k, v in f.items() }
  for d, b in g.items():
    for i in range(1, len(hashed)):
      if hashed[i] == b:
        a, c = hashed[i - 1], cleartext[i]
        f[c] = (d - a) % 10
  return f

def derive(ps, cleartext, hashed):
  out = []
  for f, g in ps:
    while True:
      g = extendg(f, g, cleartext, hashed)
      nf = extendf(f, g, cleartext, hashed)
      if f == nf: break
      f = nf
    if len(f) < 26 or len(g) < 10 or blum(f, g, cleartext) == hashed:
      out.append((f, g))
  return out

def crack(kf, kg, chunk_len=50):
  letters = 'abcdefghijklmnopqrstuvwxyz'
  ct = ''.join(random.choice(letters) for _ in range(chunk_len))
  c = collections.Counter(ct).most_common(1)[0][0]
  possibilities = [ ({ c: i }, {}) for i in range(10) ]
  while (len(possibilities) != 1 or
         len(possibilities[0][0]) != 26 or 
         len(possibilities[0][1]) != 10):
    possibilities = derive(possibilities, ct, blum(kf, kg, ct))
    ct = ''.join(random.choice(letters) for _ in range(chunk_len))
  return possibilities[0]

On average, the full key can be extracted from fewer than 150 random characters encoded! That is extraordinarily poor “cryptography”! (A full implementation used to compute this statistic is available here.)

What should non-cryptographers know about cryptography?

I have no expertise in cryptography, but I have some experience working with “formal systems” in general. That experience begins with one very important lesson: Don’t ask “Why shouldn’t system X have property Y?”; ask “Why SHOULD system X have property Y?”

The former question is a favorite tactic of obstructionist time-wasters. “I’m going to believe any arbitrary crap I like until you produce a rigorous proof that I’m wrong!” Rigorous proofs take orders of magnitude more work than just making shit up.

So I’ve got to wonder: why on earth would anyone think the “Blum Mental Hash” was “secure”? I’ve found absolutely no explanation of any theory. In cryptography in particular, even if you have a solid theory that explains how you’re propagating lots of entropy and no information is being leaked etc etc, and even if your formalization and proof of that theory is rock solid, there is still a high chance that there are factors beyond your formalization or details of the implementation that provide attack vectors. Anyone who describes any algorithm as “secure” without deep vetting by a very large community of experts immediately disqualifies themself not only from “expertise” in cryptography, but also from non-specialist competence in employing it.

When you roll your own cryptography, you look like an idiot.

The key and algorithm

Deriving g from partial f

Deriving f from (partial) g

Deriving f when first and last characters match

A naive cracker

What should non-cryptographers know about cryptography?

Deriving `g` from partial `f`

Deriving `f` from (partial) `g`

Deriving `f` when first and last characters match