Thursday, January 31, 2013

The Reflexive Density Theorem

This proof solidifies some of the earlier observations of the reflexive density constant, a necessary-but-not-sufficient requirement in the evaluation of random number generator security.

Given T, an unbiased true random number generator, which can produce 1 of N unique integers; and S, a set consisting of N such sample outputs. Then the expected number E of unique integers in S, in the limit that N approaches infinity, is given by:

E = (1-1/e)N
E = RN

Here is the proof:

Construct a bitmap B, consisting of N 0s. Obtain S by performing N invocations of T. For each integer in S, set a bit in B corresponding to the sorted order of all possible outputs of T; that is, bit 0 would correspond to the least such integer, and bit (N-1) would correspond to the greatest. (In practice, the possible outputs would be 0 through N-1, where N is a power of 2. But that's not required here.)

Because T is unbiased, the probability of any given bit ending up as 1 is uniform throughout B. So consider one of these bits in isolation.

After the first invokation of T, the odds that this bit is 1 is just 1/N. Put another way, the odds that it's 0 is (1-1/N).

After the second invokation, the odds that it's still 0 are: the odds that it was still 0 after the first iteration, times another factor of (1-1/N). Thus the odds that it's still 0 are (1-1/N)^2.

We can continue this logic over all N invocations of T. The odds that this bit ends up as 0 is then ((1-1/N)^N). Thus, in the limit of infinite N, the odds R that this bit is 1 is just 1 minus this quantity:

R = lim(N->infinity,1-((1-1/N)^N))
R = 1-1/e
R ~ 0.6321205588285576784044762298385391325541888689682321

This last step was computed here from everything that WolframAlpha knows about math. Due to the symmetry of the problem, the "oneness" of any such bit applies to whole of B, so RN is indeed the expected number of 1s.

It turns out that the reflexive density constant shows up in lots of places. For example:

1. Neural network stability at equation 6.27.

2. The exponential cummulative distribution function.

3. The Poisson distribution, if we were to add up the 1s at each bit of B, instead of ORing them.

4. The Random Arrival Time Best-Choice Problem on p. 499.

5. Some other discussion about it which shows that I wasn't the first to prove the reflexive density theorem, although hopefully you find my proof to be intuitive.

Wednesday, January 30, 2013

Mathematica Source Code for LMD4, LMD5, and LMD6

It's been brought to my attention that the C source code provided in this blog post isn't exactly well commented. I apologise for that. I think the most instructive way to show what's going on, however, is to use Wolfram Mathematica, specifically the Mathematica Kernel, which is a command line interface for math nerds. I have used this wonderful piece of software to verify the functionality of these hashes. But in the interest of reproducible research, you should be able to do the same.

So let's do an example. I'll give you the exact code you need to cut and paste in, and the expected output. Here again is the algo which underlies all 3 hashes:

FOR (i=0 to j){
  x=x xor y
  p=A*(x xor L[N, i])+c
  x=p MOD 2^N
  q=B*(y xor L[N, i xor ((j+1)/2)])+d
  y=q MOD 2^N
p=p MOD 2^(2N)
z=p xor M

And here again is a summary of what stuff means, in order of appearance above:

j is the maximum index of the block, each item of which is N bits wide.

N is half the bit width of the hash: 128 for LMD4, 256 for LMD5, or 512 for LMD6.

X0, C0, Y0, and D0 are the N-bit hash seeds, which are intended to be sourced from a cryptographic pseudorandom algo, and which is not used for any other purpose, nor ever used again. In the C source code, they appear in memory in the aforementioned order at p_base on input to the hash. Note that we have 4N bits going into a hash which will ultimately be only 2N bits wide.

x, c, y, and d are also N bits wide. (x, c) and (y, d) function in a manner similar to (x, c) in a Marsaglia oscillator. (x is at the memory address immediately below c.)

p is the first oscillator state, 2N bits wide. It's more complicated than a Marsaglia oscillator because its design is a product of evolution, but it contains the components of one.

A is the first Marsaglia oscillator's multiplier. For example, this is (2^128-2^125-2^110-2^100) in the case of LMD4.

L is the array of (j+1) N-bit words whose hash to evaluate. (In all cases, L is 2^15 bits long. It's just a question of how wide we consider a word to be, and thus how many words L contains.) Note that L is indexed starting from 1 in Mathematica, but from 0 in the algo above.

q is the second oscillator state, 2N bits wide.

B is the second Marsaglia oscillator's multiplier.

M is a xor mask, essentially a cryptographically generated nonce, which will mask the hash. (Encrypting the hash with this mask would appear to be overkill, but it's dirt cheap compared to encrypting and taking the hash of the payload, so why not.)

z is the hash output (the digest) of L, which is 2N bits wide.

In this example, we'll set things up like this:

L={1, 2, 3... (j+1)}

(M is 0 because nothing exciting happens with it anyway.) So what we're doing is just taking the hash of a series of incrementing integers, using seeds that also happen to be incremental. If you want to do something more complicated, you can modify the code to do so. Just Google around for the primitives I'm using, for example "mathematica array" if you want to learn how Array[] works,

OK let's write all this in a way that you can paste into Mathematica. (They probably have a free trial if you don't own it.) Note that all these text strings are intended to be interpreted as a single line, so if your text editor inserts line breaks, then remove them before pasting (Ctrl+V) into the Mathematica Kernel.

After the fact, you can print out individual values, for example the data block, L. (Just type "L".)

LMD4 Input


LMD4 Output


LMD5 Input


LMD5 Output


LMD6 Input


LMD6 Output


The encryption engine is left explicitly unspecified because these hashes are designed to satisfy the requirements for high entropy, high bit flip sensitivity, and irreversibility -- nothing more than that. LMD6 is used in the Karacell encryption algo, for instance. Karacell is the main motivation for putting these examples on the Web. Now you have readily accessible reference code, so that you can try to find serious weaknesses in them. Good luck!