Leidich Message Digest: February 2013

Thursday, February 28, 2013

LMD7

LMD7, whose C source code is here, is a kilobit hash which operates on 32-kilobit blocks, and largely resembles LMD6, but it's about twice as fast. (I won't repeat all the discussion of how LMD6 was evolved, but it's worth a read if you want to understand LMD7, because this article will focus only the differences.)

[Sorry about the excessive whitespace which Blogspot injects at random just to be annoying.]

Here's the algo:

j=((2^15)/N)-1
x=X0
c=C0
y=Y0
d=D0
FOR (i=0 to (j-1),i+=2){
x=x xor y
c=c xor d
p=A*(x xor L[N, i])+(c xor L[N, i+1])
x=p MOD 2^N
c=p/(2^N)
q=B*(y xor L[N, i xor ((j+1)/2)])+(d xor L[N, (i+1) xor ((j+1)/2)])
y=q MOD 2^N
d=q/(2^N)
}
p=p+(y*(2^N))+d
p=p MOD 2^(2N)
z=p xor M

where:

j is the maximum index of the block, each item of which is N bits wide.

N is half the bit width of the hash, which is 512 because an LMD7 hash is 1024 bits wide.

X0, C0, Y0, and D0 are the N-bit hash seeds, which are intended to be implied by a cryptographic context, and which are neither used for any other purpose nor stored anywhere. In the C source code, they appear in memory in the aforementioned order at p_base on input to the hash. Note that we have 4N bits going into a hash which will ultimately be only 2N bits wide.

x, c, y, and d are also N bits wide. (x, c) and (y, d) function in a manner similar to (x, c) in a Marsaglia oscillator. (x is at the memory address immediately below c.)

p is the first oscillator state, 2N bits wide. It's more complicated than a Marsaglia oscillator because its design is a product of evolution, but it contains the components of one.

A is the first Marsaglia oscillator's multiplier which, in terms of the source code constant LMD7_D0, is (2^512-(LMD7_D0*(2^32))). LMD7_D0=0xD1AEF329.

L is the array of (j+1) N-bit words whose hash to evaluate. (In all cases, L is 2^15 bits long. It's just a question of how wide we consider a word to be, and thus how many words L contains.) Note that L is indexed from 1 through 64 in Mathematica, but from 0 through 63 in the algo above.

q is the second oscillator state, 2N bits wide.

B is the second Marsaglia oscillator's multiplier which, in terms of the source code constant LMD7_D1, is (2^512-(LMD7_D1*(2^32))). LMD7_D1=0xE5467E8F.

M is a xor mask, essentially a cryptographically generated nonce, which will mask the hash. (Encrypting the hash with this mask would appear to be overkill, but it's dirt cheap compared to encrypting and taking the hash of the payload, so why not.)

z is the hash (the digest) of L, which is 2N bits wide.

And here is a Mathematica example, using exactly the same seeds and hash as over here, which also appears in demo.c in the source code. Cut and paste into the Mathematica Kernel command prompt, without any carriage returns that this website might have injected. Then hit Enter to execute it.

A=(2^512-((2^32)*16^^D1AEF329));B=(2^512-((2^32)*16^^E5467E8F));n=512;t=2^n;u=2^(n+n);X0=1;C0=2;Y0=3;D0=4;M=0;j=((2^15)/n)-1;L=Array[#&,j+1];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,((j+1)/2)]+1]]])+BitXor[d,L[[BitXor[i,((j+1)/2)]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=Mod[p+BitShiftLeft[y,n]+d,u];z=BitXor[p,M];Print[BaseForm[z,16]];

The above code, when executed in the Mathematica Kernel, should produce:

23f02b7e5b0d92db672706d46d00b7e19bf25a746dffe8c4afcf7c4d71bdb4df8128af2d41a40605a2c158cb4ea33e776b7da1877f1053bf9fa2d5c40dd01c5a19b4aa63034626d7f6adff9fa871b5709878115ec2e4c5eb1caa4b86d8dea9b28caea6278db03a6b03c6bebaad49cf3279fb724a653225613d9f6422ce63a70e

Thus LMD7 involves 2 oscillators of large mutually prime period -- ((A*(2^511))-1) and ((B*(2^511))-1) -- starting half a block apart and rotating through all words in a block, with feedback moving from oscillator B to oscillator A after each step (but not in the other direction). Because a block is 2^15 bits long, and an LMD7 is 2^10 bits long, that means each oscillator iterates 32 times before the hash is issued.

Thus immediately after oscillator A processes word 0, oscillator B processes word 16; immediately after oscillator A processes word 31, oscillator B processes word 15. Thus in the weakest case, a given word is iterated through one of the oscillators only once, while being iterated 17 times through the other.

The main reason why LMD7 is about twice as fast as LMD6 is that the word width of the former is twice as long as the latter. In other words, while LMD6 only xors X with the plaintext, LMD7 xors both X and C. Normally, this would be unsafe because the high bit of a Marsaglia oscillator is biased. But in this case, the bias of each oscillator is 1 part in O(2^(512-64)), or O(10^134). Is that exploitable? Good luck.

One obvious requirement is that both oscillators come up to essentially maximum entropy in at most 17 iterations. That way, when one oscillator is at its weakest (i.e. a single digesting iteration remaining), the other is sufficiently strong to pick up the slack. To that end, LMD7_D0 and LMD7_D1 have been chosen so that the scintillating entropy profile of the oscillators ends up at 511 bit transitions after 17 iterations, in both cases. In other words, starting from 2 1024-bit random seeds which differ by only a bit, the xor of the 17-iterated versions of these seeds will itself contain 511 bit transitions -- trivially shy of maximum entropy. But there are 2 such oscillators, which after digesting the block, are combined by crossing high and low 512-bit parts, using carry-propagated addition, to yield a 1024-bit hash which is for all practical purposes unbiased. And by the way, this is where the factor of 2^32 comes in A and B: without this factor, the oscillator doesn't entropize fast enough. But with it, we can use simple 32-to-64-bit multiplies to rapidly iterate each oscillator.

In the interest of creating a compelling case for the security of LMD7, perhaps the most important consideration is demonstrating that setting a single 1 bit in a block consisting of all 0s produces a change to the hash of such complexity as to appear random. This "change" is what I call the
"xor compensator" of the 2 hashes, because when this value is xored to the hash of the poisoned block, it morphs into the hash of the legitimate block. Statistically speaking, this means that if we do 2^2N such tests, and thereby generate as many xor compensators, that we should expect to encounter (R*(2^2N)) unique xor compensators, where R is the reflexive density constant.

(By the way, xor is obviously not the only way one could compensate a hash. Addition, modulo multiplication, and an infinite number of other functions are candidates as well. But xor tends to be a good proxy for addition: if the xor compensator shows dangerous skews, then chances are that the addition compensator does so as well, and vice versa. Bias in other compensator spaces are an interesting open question, and I don't pretend to be proving anything in this regard.)

The only minor problem, of course, is that 2N=1024. This forces us into a regime of extrapolatory algogenesis, which is about the best we can do, absent a proof of complexity, which has thus far eluded all hash algos ever invented. We can always to a more thorough job in this regard, but 2 cases are explored below, one with N=8 and the other with N=9. In other words, we are scaling LMD7 down from a 1024-bit hash to a 16-or-18-bit hash, repsectively. The algo is otherwise unchanged, including the number of words per block, 64.

Mainly, we're looking for evidence that setting a single 1 in a block of 0s with 2^2N random seeds creates a xor compansator set which is compliant with R, as described above. Any more complicated change to any more complicated block might lead us to unjustifiably optimistic assumptions about the strength LMD7.

The Marsaglia A and B values in this case will be (t-46) and (t-52), where t=2^8, as you can see in the code below. (By the way, (A*(2^8)-1) and (A*(2^7)-1) are both prime as required, and similarly for B.) So here's the Mathematica code which will do this (remove any carriage returns injected by this blog website, then paste into the Mathematica Kernel command prompt). (I realize that the code looks horrible, which is because it's only intended for verification purposes; the C is much more elegant, I promise!)

n=8;t=2^n;A=(t-46);B=(t-52);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;F=RandomInteger[j]+1;G=RandomInteger[n-1];L[[F]]=BitShiftLeft[1,G];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[F]]=F;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];

The output might take 10 minutes to generate. It looks like this (but might slightly differ from yours, on account of the RandomInteger[] function being nondeterministic):

w=0
16
.
.
.

w=f000
16

R=0.607376
R_ideal=0.632121
population_max=8
population_max_density_log2=13.

(The W's are just progress indicators, and the "16"s are just Mathematica's obnoxious way of telling us that the result is in hexadecimal.) The important part is at the bottom: This was a 16-bit hash (because A and B both have 8 bits), which as you can see from the W's, we've tested 2^16 times. On each test, we hashed a block of 0s against the same block with a randomly selected bit set to 1, starting from the same random seed in both cases. We then xor the resulting hashes. Finally, we set the bit corresponding to this xor-of-hashes (Z) in a table (K) of 2^16 xor compensator population counts, initially all 0s.

When the experiment is done, the fraction of nonzeros we expect to find in the table is R, or 0.632121... As you can see, this hash performed well, but not great, coming in at 0.61. That means that the xor compensator is modestly biased, given a single bit flip in the simplest possible plaintext configuration, which is a block of 0s. You could say that we're measuring the "ground state entropy" of the hash.

As to population_max_density_log2, that's the log2 of ((2^2N)/population_max), or log2(65536/8) in this case. This represents the log2 rarity of the most popular xor compensator. It's subject to a lot of noise, but it functions like a canary in a coal mine: if it's too small, then the most popular xor compensator can be easily guessed by examining test cases involving known seeds. For example, if it were 3, that would mean that 1 in 8 xor compensators were the same value. 13 is probably fine for a 16-bit hash.

In fact, what is the statistically expected value of the maximum population? Combinatorics is informative:

Let Q be the probability of one particular integer in the population map (K, in the code above) being incremented ("hit") T times, i.e. of the corresponding xor compensator occurring T times, assuming that the xor compensators are distributed randomly because the hash is "hard".

Let U=2^(2N), where 2N is the number of bits in the hash.

Then:

Q=((1/U)^T)*((1-(1/U))^(U-T))*(U nCr T)

because:

((1/U)^T) is the probability of being hit T times, at exactly the iteration numbers predicted.

((1-(1/U))^(U-T)) is the probability of not being hit (U-T) times, at exactly the iterations numbers predicted.

(U nCr T) is the number of possible combinations involving T hits over U attempts.

So then:

P=1-((1-Q)^U)

is the probability that at least one integer will be hit exactly T times.

because:

(1-Q) is the probability that a given integer is not hit exactly T times.

((1-Q)^U) is the probability that no integer will be hit exactly T times.

So (1-((1-Q)^U)) is the probability that at least one integer will be hit exactly T times.

If P=0.5, then T is the expected maximum population. Since T is discrete, we don't expect to hit P=0.5 exactly. That's OK, so long as T ends up in a state which is "reasonably probable". T=1000 with N=8, for example, would not be reasonably probable. That would be bad because it would suggest that the xor compensator could easily be guessed based on statistical analysis involving a random set of known seeds.

So in this case, we have T=8 and U=2^16. Find P with Mathematica:

t=8;n=8;u=2^(n+n);p=(1/u)^t*(1-(1/u))^(u-t)*Binomial[u,t];p=1-((1.-p)^u);Print["p=",p];Print["p_reciprocal_log2=",-Log[2.,p]];

which produces:

p=0.449961

p_reciprocal_log2=1.15213

Or if you like WolframAlpha, which I do because I'm a nerd:

http://www.wolframalpha.com/input/?i=1-%28%281-%28%28%281%2F65536%29%5E8%29*%28%281-%281%2F65536%29%29%5E%2865536-8%29%29*Binomial%5B65536%2C8%5D%29%29%5E65536%29

In plain English, the probability of discovering at least one population of 8 in the map is about 44% -- so our result is utterly unremarkable if we assume that the hash is hard. We can't prove the converse, but so far, so good.

p_reciprocal_log2 is just Log[2,1/p], which is just a logarithmic way of looking at p.

Let's kick it up a notch, as Emeril Lagasse would say. Try N=9, meaning an 18-bit hash, with A=(t-17) and B=(t-47), where t=2^9 and the usual primality requirements are satisfied:

n=9;t=2^n;A=(t-17);B=(t-47);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;F=RandomInteger[j]+1;G=RandomInteger[n-1];L[[F]]=BitShiftLeft[1,G];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[F]]=F;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];

...and the results:

R=0.627625
R_ideal=0.632121
population_max=8
population_max_density_log2=15.

Thus R is closer to R_ideal. This isn't an accident. In part, it comes from the fact that we're using less biased oscillators, i.e. A and B values which are proportionally closer to 2^N in with N=9 than N=8. You can imagine how this would work out if the bias were 1 part in O(10^134), as with LMD7: R should be indistinct, in practice, from R_ideal.

And what about population_max_density_log2? This time, N=9 but still T=8:

t=8;n=9;u=2^(n+n);p=(1/u)^t*(1-(1/u))^(u-t)*Binomial[u,t];p=1-((1.-p)^u);Print["p=",p];Print["p_reciprocal_log2=",-Log[2.,p]];

which produces:

p=0.908519

p_reciprocal_log2=0.138411

Which means it's 90% likely that we'll find at least one population of 8 (i.e. T=8) in the map -- a tad high, when we're looking for something nearer 50%. But it turns out that T=9 has a 23% chance of the same. So T=8 is unsurprising.

What this all points to, thus far, is that the xor compensator appears to be randomly distributed, i.e. "mini LMD7" is hard. Extrapolation suggests, then, that LMD7 is also hard, as measured by the same tests.

But it behooves us to pound harder on the most weakly protected bit in the block.

The weakest bit in the block is probably bit ((2^14)-1), as measured on average over a wide variety of seeds. This bit is weakly protected because it's the last bit to be integrated into the B oscillator, and is thus subject to the least number of entropizing iterations after the fact. Additionally, while the state of the B oscillator feeds into the A oscillator (via xor at the top of the loop), the reverse isn't true, so B probably cannot entropize faster than A. We return to the N=8 case, with the difference that we'll only compare a block of 0s to the same block with bit ((2^14)-1) set ("L[[(j+1)/2]]=BitShiftLeft[1,n-1]"):

n=8;t=2^n;A=(t-46);B=(t-52);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#*0&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;L[[(j+1)/2]]=BitShiftLeft[1,n-1];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[(j+1)/2]]=0;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];

The results are somewhat disturbing:

R=0.521194
R_ideal=0.632121
population_max=11
population_max_density_log2=12.5406

R is badly skewed away from R_ideal. And population_max is also elevated, with a P value of 1 part in 6x10^(-4). So the xor compensators do not appear randomly distributed. On the other hand, R is still large enough to create intractability when N=512, and population_max is still small enough to be safe, in the sense that population_max_density_log2 exceeds N, which means that a executing a birthday attack on the hash is still easier than guessing the correct xor compensator, even if the correct answer happens to be the most popular one. In other words, the xor compensators have a density which is (exponentially) larger than the square root of the hash space -- even in this uberweak case.

So do things improve when N increases to 9? Let's try:

n=9;t=2^n;A=(t-17);B=(t-47);u=2^(n+n);v=t-1;K=Array[#*0&,u];R=0;S=0;j=63;h=(j+1)/2;L=Array[#*0&,j+1];For[w=0,w<u,w++,X0=RandomInteger[v];C0=RandomInteger[v];Y0=RandomInteger[v];D0=RandomInteger[v];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z0=p;L[[(j+1)/2]]=BitShiftLeft[1,n-1];x=X0;c=C0;y=Y0;d=D0;For[i=0,i<=j,i+=2,x=BitXor[x,y];c=BitXor[c,d];p=(A*BitXor[x,L[[i+1]]])+BitXor[c,L[[i+2]]];x=Mod[p,t];c=BitShiftRight[p,n];q=(B*BitXor[y,L[[BitXor[i,h]+1]]])+BitXor[c,L[[BitXor[i,h]+2]]];y=Mod[q,t];d=BitShiftRight[q,n]];p=p+(y*t)+d;p=Mod[p,u];z1=p;L[[(j+1)/2]]=0;z=BitXor[z0,z1];If[K[[z]]==0,R++];K[[z]]++;If[K[[z]]>S,S=K[[z]]];If[BitAnd[w,16^^FFF]==0,Print["w=",BaseForm[w,16]]]];Print["R=",R/(2.^(n+n))];Print["R_ideal=",1-Exp[-1.]];Print["population_max=",S];Print["population_max_density_log2=",Log[2.,u/S]];

The results, averaged over 2^18 trials, are much more consistent with randomness:

R=0.627911
R_ideal=0.632121
population_max=8
population_max_density_log2=15.

T=8 at N=9 is, as discussed above, utterly unremarkable. So the statistics of the system are indeed starting to whiten (sic) out.

Results may vary, and it's probably true that, given enough computer power, we would observe scintillating entropy on the way to N=512, but it appears that the LMD7 xor compensator space is asymptotically random. In other words, LMD7 would appear to be hard -- even if left unencrypted (which is not recommended, out of an abundance of caution) -- to the extent that its seeds are opaque. But prove me wrong if you can.

Friday, February 22, 2013

Harder than Diehard: The Scintillating Entropy Test

With apologies to the late great George Marsaglia, whose Diehard Tests are a cornerstone in the quest for credible pseudorandomness...

The Second Law of Thermodynamics, which has more to do with math than physics, states that "the entropy of an isolated system never decreases", to quote Wikipedia. But this law applies to physical systems under the classical assumption of continuity among infinite particles, as distinct from quantization among finitely many particles. A Marsaglia oscillator is definitely the latter variety, its state space being constrained by the number of bits in its iterand.

We all know that such finite discrete systems much eventually fall into a limit cycle. Indeed, one of the beauties of Marsaglia oscillators is that their limit cycle lengths can be predicted exactly without actually running an entire cycle.

So what this means is that if we start with a value on the limit cycle, that we will eventually return to the same value, given enough time. Returning to the thermodynamic metaphor, it's as though you have an egg on the table. It then rolls off and splats on the floor. The next day, you find that the egg has suddenly cleaned itself up and reassembled itself on the table.

What would one expect to see on the way to the egg splatting? First, the speed of the egg would increase. Then, as it hit the floor, cracks would spread throughout the shell, migrating from the point of impact, wider and wider, until the shell loses structural integrity, and the yolk spews out all over the place. In other words, a monotonic increase in entropy from start to finish, straight in tune with the Second Law.

We expect to see the same in a Marsaglia oscillator. The trouble is, we know that, eventually, the system will return to its original seed, provided that the seed is indeed on a limit cycle (which is usually the case). How does this occur? Does the egg go from complete splat to happily sitting back on the table, instantaneously? Not exactly. Believe it or not, what we observe is entropy scintillation. Just like a star on the horizon, which rapidly dims and flares with changing atmospheric phenomena, the entropy comes, and goes, then comes back again. (I'm not referring to the fact that the scintillation of a star can be an entropy source, which is clearly true; I'm saying that the degree of entropy itself is scintillating.)

We can see this in practice. Here's one good way to measure entropy accrual in any N-bit integer oscillator:

1. Pick a random N-bit seed from a uniform distribution. Call it S0.

2. Copy S0 to S1.

3. Flip a bit of S1 at random, so S0 and S1 differ by a bit.

4. Iterate S0 and S1 separately, but after each stage, take their xor. (This is called the "xor compensator".) Count the number of 0-to-1 or 1-to-0 transitions, T, in the xor. (Consider the N bits to be arranged in a loop, so that the maximum value of T is N, not (N-1).) [EDIT: On second thought, it's probably better to think of the xor compensator as having bit (negative 1) as 0. That way, (all 1s) is considered to have 1 transition, which is more entropic than (all 0s), which has 0 transitions. (And moreover, the number of transitions can now be odd as well as even.) But this doesn't change the results much, apart from making them a little more accurate.]

So after step 3 and before we iterate at all, we expect to find 2 transitions in the xor, namely, one 0-to-1 and one 1-to-0, because the xor must contain a single 1 bit, corresponding to the bit that was flipped in #3. After each iteration of this oscillator, we should expect to find more and more transitions. If we run many trials from #1 onwards, then, after a certain number of iterations, the average number of transitions should be N/2. (This is easily demonstrated by substituting a true random number generator for the pseudorandom iterator.) This comes from the fact that a TRNG has a 50/50 chance of generating the next bit different from the current bit.

In a perfect TRNG, we hit 50/50 after the first iteration, because there is no iterator! With a pseudorandom iterator, however, we expect to have to "rev it up", like a car starting on a cold day, only approaching the N/2 threshold after so-many iterations. The natural implication is that, if we simply iterate a few times between producing numbers, instead of just once, that we can approach the N/2 limit as closely as desired.

Unfortunately, Marsaglia oscillators (and, I suspect, various other wannabe random iterators) don't behave like this. Yes, they do approach the N/2 limit in 10 or 20 iterations. But then they back off, while the egg puts itself (somewhat) back together again! What we have, most certainly, is scintillating entropy. This is a smoking gun for pseudorandomness.

Let's look at the data. I happen to be investigating a 512-bit Marsaglia oscillator with B=(2^256) and A=(B-0xE66FAD12). So perfect entropy means averaging 256 transitions in the xor of the iterands. (Too many transitions are just as bad as too few: 01010101... is not very entropic!) However, that's not what happens, as you can see below. Entropy maxes out after 10 iterations, then backs off, only to return 6 iterations later:

0: 2
1: 11.0839
2: 37.1358
3: 68.3862
4: 96.3759
5: 125.272
6: 154.67
7: 181.451
8: 205.462
9: 232.128
10: 255.925
11: 264.856
12: 265.29
13: 263.62
14: 259.082
15: 257.353
16: 256.811
17: 255.263
18: 257.525
19: 255.262
20: 249.468
21: 249.762
22: 253.123
23: 256.236
24: 258.244

(The "2" next to "0" (iterations) is just a sanity check. It's correct, meaning that we do indeed always begin with 2 seeds which differ by a single bit.) Now perhaps you're thinking "These numbers are all pretty close to 256 after the 10th iteration. The error is just noise." But you would be wrong: I seed the program with the Jytter TRNG every time, but I get (very nearly) the same numbers, with maximum entropy after iterations 10, 16, 19, and so on. Remember that I'm starting from random values. It's incredible that this pattern exists at all, let alone so many standard deviations out after millions of tests starting from different seeds. What you see above is the Marsaglia system genuinely reversing its entropy, to some extent. (Yes, it's true that my definition of entropy as bit transitions is by no means the ultimate definition, but I think you'll see some form of scintillation, which persists when averaged over many seeds, no matter how you choose to quantify entropy.)

But this should not be surprising, because after all, we know that the oscillator must eventually return to its point of origin, to the extent that such point is on the limit cycle. It would be more surprising if it did so as a singular event, as opposed to the culmination of entropy scintillation throughout the entire cycle.

A TRNG should hit the N/2 limit after a single iteration. Any oscillator which fails this test, remaining defiantly aloof from N/2 despite many trials, is a pseudorandom generator or a broken TRNG. The tantalizing hypothesis is that this remains true, regardless of how many "revving" iterations the generator is allowed. (A really smart PRNG would know exactly where its "entropic nodes" were, and produce numbers at only those points, for example, 10 and 16 and 19 above. But then we would then need to worry about correlations between iterations 10 and 16, etc., which is why this test is so diehard.)

I wonder if this has any security ramifications to existing infrastructure.

But more to the point, I wonder if this has physical equivalents. Perhaps physical systems can oscillate in and out of chaos, despite having essentially no net energy exchange with the environment. Meteorologists say that there are clouds in the sky because energy from the sun drives the processes which form them. But could clouds exist if the atmosphere were a closed system? Could we oscillate between periods of uniformally distributed gasses, and organized thunderstorms? If we view the entire atmosphere as a cellular automaton, of the sort described in A New Kind of Science, then I think the answer would be yes...

Friday, February 15, 2013

The Utility of Extrapolatory Algogenesis

As I sort of implied this post, I think there's something missing in the discussion of allegedly secure systems, whether related to cryptography or authentication: extrapolation. (This technique is also applicable to a wide variety of other algos having nothing to do with security.)

Let me explain what I mean by this. I think most cryptographic algogenesis occurs empirically, in other words, we change our evolving algo to do something that makes things look more random, then run some correlation analyses on "these" integers vs. "those" integers. Finally, when the Poisson distributions look good, it must be a winner!

The whole problem of course, is that we humans are idiots. We see entropy in everything we don't understand. There may be some very simple way to crack these systems, but we can't see the forest because the trees are blocking our view.

Wouldn't it be cool if we could see the entire forest and how it interacts with all of the trees, in some exhaustive sense? In other words, look at entire classes of attacks, all at once? For instance, when I change a single bit of a message, then, given every possible every hash seed, what is the set of possible xor compensations which will legitimize the hash? There are plenty of other, similar questions, but they all ask the same basic thing: what are all possible ways to do operation X, given all possible seeds or keys?

Unfortunately, even with a quantum computer on hand, you would be hard pressed to answer any such question in a useful way because the state space is so large as to defy gathering statistics. In other words, the space may be much narrower than you think, but for practical purposes, 2^100 might as well be 2^1000.

On the other hand, extrapolation is a very powerful tool. More often than not, it seems, the properties of large mathematical systems can be extrapolated from small ones, even when we don't fully understand them. This isn't always true, of course. The history of mathematics is peppered with extrapolations that failed in spectacular fashion. But such eventualities would appear to be the exception.

What I'm advocating, given that by definition cryptosystems are too vast to exhaustively analyze, is that we rely on the method of extrapolatory algogenesis to produce them. It means what it sounds like: extrapolate the properties of a big algo from a small one. Want to produce a 512-bit asymmetric key exchange algo? Fine. Then start with an 8-bit variety, and study every possible key pair. Then move to 9 bits, then 10, etc., until your cloud computing bill is threatening to bankrupt you. At each stage, ask key questions, such as the one above, which test the strength of the system in a variety of commonsense ways. The answer to each question reveals a certain property of the algo, for instance, what is the distribution of the various limit cycle lengths involved?

Next, create a spreadsheet, showing the various properties with an 8-bit seed or key, then 9 bits, etc. Is the complexity scaling exponentially with the number of bits, or more slowly than that? How noisy is the data? Can you use statistical methods, perhaps on the raw numbers or their logarithms, in order to extrapolate out to your 512-bit target? Once you do the extrapolation, what does it tell you? Is your algo "hard" enough?

There's no guarantee that this will work. The extrapolation might be invalid, or, more likely, you will fail to ask all the "right" questions. But what this process can do, is reveal some hidden weaknesses. You can then modify the algo, extrapolate the properties again, and iterate as such until a satisfactory algo is obtained.

Once you have this 512-bit masterpiece, then what? For one thing, you can fall back on the old methods of differential cryptanalysis -- mainly, looking for Poisson distributions where you expect to find them, of the quality that a perfect random number generator would produce. Compare "these" integers to "those" integers, ad nauseum.

The result can only be better than what the last step alone can produce. Can you improve your own favorite algo using this technique?

Landauer's Principle vs. Classical Computability

In the quest for a more secure hash or encryption algorithm, the question naturally arises, how many key bits are enough? Is there a key which is physically impossible to crack?

The short answer is "it depends on whether P=NP". Even then, some algos will be weaker than others, meaning that they would require longer keys to achieve comparable security.

There is, however, a physical limitation here which is informative, namely, the number of bits which we could possibly flip on the way to discovering a solution. Landauer's Principle tells us that the minimum amount of energy required to flip a bit (at least, in a classical computer) is:

kT ln 2

where "where k is the Boltzmann constant (approximately 1.38×10−23 J/K), T is the temperature of the circuit in kelvins, and ln 2 is the natural logarithm of 2 (approximately 0.69315)", according to Wikipedia.

But what about this "circuit temperature"? Considering that the average temperature of the universe is not less than that of the microwave background, or about 3 kelvin, we already have a lower limit on the energy expenditure required, which is about 3x10^(-23) joules. (You could use a cryostat to achieve a lower temperature, but it wouldn't help you, because you would just heat up the universe elsewhere.)

And just how much energy is available to expend? Definitely not more than the total energy of the universe, which WolframAlpha puts at 2x10^69 J. (This value includes the energy equivalent of the mass of the universe, via Einstein's famous equation, E=mc^2. Nonetheless, it's obviously subject to considerable error, and probably fails to account for the vaccuum energy, dark energy, and who knows else. Still, this is a logarithmic analysis, so we can afford to be somewhat cavalier.)

The ratio of these energies is then about 7x10^91. The log to base of 2 of this value is about 305.

What this means, then, is that the longest key need only be 305 bits, because we couldn't count to a larger number, given all the energy in the universe. This upper bound is only valid, of course, if we assume that the fastest way to crack a given algo is to try every key. (We would likely need try only half the keys, so the key should be 306 bits long. Whatever.)

The other question is whether Landauer's Principle pertains to quantum computers, and more importantly whether a useful quantum computer can be constructed. I suspect that the answers are no (because quantum computations are required to be reversible, whereas the above analysis pertains to the opposite, and fundamentally because the Heisenberg Uncertainty Principle allows quantum systems to "borrow" energy) and yes (because it's "only" an engineering problem). So perhaps this upper bound of 306 bits is just a thermodynamic fantasy.

At best, it appears from the literature that a quantum computer can square root the complexity of a given algo (and sometimes cannot help at all). If this concept relates to energy expenditure as well, then perhaps the equivalent limit for quantum computation would be 612 bits. But I have no idea...

Tuesday, February 12, 2013

The Birthday Attack Myth

[EDIT: Rewritten because the first version sucked.]

In cryptographic circles, it's often said that a hash is at best as strong as the square root of the size of its state space. In other words, assume that we have a hash consisting of 2N bits. Then the number of possible hashes is at most H=2^(2N). The difficulty of "cracking the hash" -- in other words, finding any 2 different messages with the same hash, known as a "hash collision" -- is then no harder than O(2^N). This is the fundamental theory behind a birthday attack. (This might allow someone to "prove" that bogus credentials were legitimate, for instance.)

The birthday attack myth is that this complexity estimate applies to the execution time (in processor-seconds), and not only the operation count. It does not, for the reasons that follow.

The first problem is one of practicality: the attack only predicts the creation rate of messages with equal hashes -- not the creation rate of messages with a specific hash, which would cost O(H), assuming an optimally hard hash. So this is not a useful tool for modifying existing authenticated messages whilst ensuring that the hash still works out. Granted, to the extent that (2^N) messages are available in the wild as targets for malicious modification, then generating an additional (2^N) fake messages would be moderately likely to result in one of them having the same hash as one of the originals, the exact probability being given in the article linked above.

The second problem goes to the heart of the complexity question: interprocessor communication. There are 2 extremes to consider, with other configurations lying somewhere in between:

At the serialized extreme, assume we have a single processor. The fastest way to solve the problem is probably to sort the list of (2^N) hashes (in O(2^N) time, with a radix sort). Then we can scan through the sorted hashes, looking for one which occurs twice, again in O(2^N) time.

At the (classically) parallelized extreme, assume we have a processor for each hash, and that each processor is connected to every other. Then the hashes could sort themselves by moving to their expected position in the sort, and adjusting dynamically. A given hash might find its proper place in the list in O(N) time (probably worse on account of network congestion). In any event, this is certainly much faster than the serial case, but the computational complexity is actually greater, because we're using many processors inefficiently in order to run faster than one processor used efficiently. It's O(N*2^N), at best.

Thus we can't expect to do better at minimizing the computational complexity than to stick with the serial case. But that's fine because, as the birthday attack predicts, it's O(2^N), right?

Well, not exactly. The operation count is indeed O(2^N). But what if the time that an operation takes is itself a function of N? Then the execution time must be disproportionately larger.

The problem is one of memory size. The larger the number of hash samples, the larger the memory, in a physical sense. Memory latency is a function of the time it takes for a signal to traverse a 3D memory, which is O((2^N)^(1/3)), assuming that all the samples are packed into the smallest possible space, which is approximately a sphere.

Thus the execution time is O(2^N) times O((2^N)^(1/3)), which is O((2^N)^(4/3)). (Parallelizing the system will reduce the "wall clock" execution time, but not the execution time measured in processor-seconds. Moreover, the same (2^N)^(1/3) factor applies to the parallel case as well, because hashes must fly a certain distance before arriving at their sorted locations.)

By the way, given that the rate of radiation of a sphere is proportional to its surface area rather than its volume, the maximum degree of parallelism would occur with all the processors packed tightly on the surface of a sphere. Any attempt at tighter packing (using the interior of the sphere) whilst running all processors in parallel could only be sustained at such lower rate of computation as to render the topology performance-equivalent to the surface case. (This thermodynamic point bears uncanny resemblance to the Holographic Principle's implication that the information content of a black hole is (merely) proportional to its surface area.) The upshot is that the interprocessor communication latency given above is underestimated, in the regime of maximum efficiency packing of processors executing at maximum sustainable speed.

As to quantum computers, I was originally hopeful that Grover's algo could reduce the execution time. On second thought, I don't think so. The problem is that the hashes must either be loaded into the quantum computer from classical media or they must be generated from candidate messages produced inside quantum superposition. But multiverses in a superposition can't communicate, so far as I know. So in each case, we would have to search for each hash via classical media. So there would be no complexity reduction in either case.

The point of all this is simply that the computational complexity of a birthday attack is exponentially less than its predicted execution time in processor-seconds, and that this would appear to hold true even in the quantum computing regime. Because the latter metric is what matters in the physical world, the implication is that hashes are actually stronger than supposed, independent of the question of weaknesses unrelated to birthday attacks.

Saturday, February 2, 2013

The Marsaglia Limit Cycle Theorem

Or something like that. This is really the "Marsaglia Limit Cycle Theorem for LMD2 and Later". I don't know what else to name it. Anyway here's the proof that you always end up in the bright region as opposed to the dark region when considering Marsaglia oscillators which have (2^(N-1))<=A<(2^N) with X and C both being N bits, implying an oscillator state of 2N bits. I could probably make it more succint, but I think you'll get the idea.

Again, we're talking about multiply-with-carry oscillators, which I prefer to call "Marsaglia oscillators".

Consider again S, the seed of cycle 2. The theorem says that S is the maximum integer which lives on the steady state part of the Marsaglia oscillator's cycle, in other words, the largest integer that you can produce more than once when starting from an arbitrary integer.

S=((A-1)*(2^N))+((2^N)-1)

First of all, it should be clear that any integer I on [0, S] will iterate to another integer on [0, S]. The reason is that the maximum X value in this range is ((2^N)-1). If you multiply that by A, you get ((A*(2^N))-A). But C is by definition at most (A-1), so you can never actually get the new C greater than (A-1). So if we start in the bright region, we stay there.

Now in the dark region, all the integers are on [S+1, (2^2N)-1]. If we iterate the maximum value on this interval, which happens to be the one which produces the maximum possible output, we get ((A*((2^N)-1))+((2^N)-1)), or ((A+1)*(2^N)-(A+1)). Thus the maximum new C value is just A. We're almost into the bright region.

Let's iterate again. If the maximum possible X value after the first iteration is ((2^N)-1), then AX is going to be (A*((2^N)-1)), or ((A*(2^N))-A). If the maximum new C value after the first iteration is A, then it looks like we can end up with (A*(2^N)), after adding it to AX. But we can't. Because when that first new C value is equal to A, then its associated X is ((2^N)-(A+1)). Since A is at least (2^(N-1)) based in all the LMDx oscillators after LMD1, X must be less than A. Which means that after the second iteration, the maximum new C is now (A-1). Thus we're in the bright region, after at most 2 iterations.

Friday, February 1, 2013

Information Loss in Marsaglia Oscillators

Now that you've seen the pretty pictures in the previous post, it should be intuitively clear how many bits of information are lost when one iterates a uniformally distributed random number through a Marsaglia oscillator an infinite number of times (and, I think, at most twice). In simple terms, we lose the dark region shown in the images. How many bits is that worth?

Consider S, the seed of cycle 2:

S=((A-1)*(2^N))+((2^N)-1)

Recall that S iterates back to itself. I strongly suspect that every integer greater than S is in the dark region. It seems intuitive if not obvious, and agrees with my observations, but I admit that I don't have a proof. [EDIT: Now I do.] This all looks to be quite simple, but perhaps we have another Riemann Hypothesis on our hands. (I doubt it, but arrogance makes fools of mathematicians.)

If you accept this, then it's easy to calculate the average information loss L, in bits, of starting with a randomly chosen seed (x, c) and then iterating the oscillator until it hits a limit cycle (including the trivial cycles 1 and 2):

L=log2((all possible seeds)/(all residual seeds))
L=log2(2^(2N))-log2(S+1)
L=2N-log2(A*(2^N))
L=N-log2(A)

To the extent that (2^(N-1)) <= A < (2^N), then we have 0 < L <= 1. Since L is independent of N, this means that we can render the loss negligible by setting N sufficiently large. All this seems to be a reasonable compromise in light of the apparent quality of randomness provided.

Visualizing Marsaglia Oscillators

In the question of assessing the pseudorandom quality of a Marsaglia multiply-with-carry oscillator with 2N bits of state, of the sort underlying LMDx after LMD1 [EDIT: As in, (2^(N-1))<=A<(2^N)], it occurred to me that pictures might be of some use.

To do this, I took various A values that are possible with N=9. (By "possible", I mean that the period (P=(A*(2^(N-1))-1)) is a Sophie Germain prime.) Then I created a map of the entire cycle space. It's pretty clear that only 4 cycles exist in such an oscillator:

1. 0 maps to itself.

2. ((A-1)*2^N)+((2^N)-1) maps to itself. This is a "nuisance cycle" because it's easily overlooked and threatens to shut down the oscillator if chosen as a seed.

3a. One of the big cycles has period P.

3b. There are some integers greater than or equal to (A*(2^N)) which eventually find their way onto cycle 3a, but never return to 3b due to limits on the maximum possible output value within 3a.

4a. The other big cycle has period P as well.

4b. Analagous to cycle 3b, there are points which degenerate into to cycle 4a as well.

The number of integers in 3b and 4b is empirically similar, but not equal. I can't explain the difference.

You can see all this in the images below, which have A values of 495, 465, 459, 402, 390, 369, 297, and 270, respectively. There is a bright area, which contains 3a and 4a, and a dark area, which contains 3b and 4b. The upper left and lower right corners of the bright area are white pixels indicating cycles 1 and 2. The rest is red and blue, for cycles 3 and 4.

These images show clearly why these oscillators suffer from bias in the high bit: once you jump from the dark to the bright region, you can't return. One way to deal with that is to engineer A so that the dark area has negligible size.

And I don't know about you, but I'm seeing squarish cells and somewhat nonrandom lower frequency oscillations in the images. Still, it looks very solid for such a simple function. (They are losslessly encoded 512x512 GIF images, so there are no JPEG artifacts here. Right-click and "open in new window" if your browser is scaling them in some annoying way, which mine appears to be doing.)