Monday, March 26, 2012

A Word on File Size

For those of you who wish to use LMDx for purposes other than detecting accidental errors on fixed-block-size storage devices...

As explained in previous posts, LMD2 and LMD3, which I published in the previous post, act on 32-bit values ("u32" in the code). This means that a file which is a multiple of 4 bytes in size will have the same LMD2 (and LMD3) as another file whose length is 1, 2, or 3 bytes longer, provided that those additional bytes are all 0s.

While this seems bad, it's not actually abnormal. After all, most other digest algos operate on bytes, even though a file may end on a bit boundary. If you want to include the exact size (to the nearest bit, or byte, or whatever), then simply ensure that you integrate it into the LMDx that you're using. For example, you could xor the low bits of the generator seeds (X0 being lower in bit position than C0) with the file's size. Or, you could simply ensure that its size is embedded in a header, which itself falls under LMDx protection. [EDIT: A temptingly simple approach is to set the first bit of the padding, and clear it beyond that bit. However, I think the aforementioned seed xor method is superior. Appending a 1 creates a case in which it's easy to generate the same hash maliciously, or even accidentally.]

Of course, if the file size is implicit based on other content, then you already have a way to know whether it has been accidentally truncated or expanded, independent of LMDx.

UPDATE: Likewise, LMD4, LMD5, and LMD6 operate on blocks of 2^12 bytes, so files which do not have this granularity may have the same hash due to similar 0 padding, with the same solution as for LMD2 above. (The hash of a null file is the hash of a block of 0s.)

Lastly, why does this font suck? 0 (zero) comes out like O (the letter "oh"). Ah, the mysteries of Blogger...

C Source Code for LMD2 through LMD8

It's available here (and more specifically, here) on Github, for Linux and Windows. See the file "COPYING" for information on licensing terms, which are very liberal but different than for (obsolete) LMD. See README.txt for instructions on building the demo.

LMD4 through LMD8 are very different creatures from LMD2. I will discuss them in a future post. (I'm the sort of person who prefers to get the code done first, then the documentation, and not the other way around.) And no, these new algos do not obsolete LMD2. LMD2 (or LMD3, if you prefer) are still the recommended EDCs for the rapid detection of accidental (as opposed to malicious) errors.

NOTES:

1. Do not call the C functions with pointers to overlapping memory regions.

2. A and B in the post linked above are known as LMDx_A0 and LMDx_A1 in the source code, respectively, where x=[4,5,6].

3. LMD6 comes in 2 flavors, one optimized for 32-bit systems, and the other for 64-bit.

Friday, March 16, 2012

LMD3 for 64-bit Pseudorandom Sequences

[EDIT: This post was revised on September 16, 2012 because the LMD3 code was right, but this documentation was wrong.]

LMD3 was introduced informally in the previous post. It is not indended to be used as an error detection algo, unless you're more interested in long nonzero sequences than maximimizing double-bit error detection coverage length. For the latter, LMD2 still serves its purpose.

Alternatively, LMD3 can supplement LMD2, if you're paranoid and want to use a 128-bit EDC. In such a regime, 64-bit pseudorandom numbers can be rapidly generated, by concatenating the "x" values of LMD2 and LMD3. (Remember, "c" is not considered part of the pseudorandom output; it's just present to prevent "x" from behaving too predictably.) A smooth transition between 32 and 64 bits can occur.

Let me explain. I happen to be working on an unrelated algo at the moment, which requires that 32-bit pseudorandom numbers be used. However, when the data volume reaches a critical point, 32 bits is no longer enough; I need 64 bits. The problem is that I need a relatively smooth transition between the two, such that the 32-bit fraction at iteration "n" is very close to the 64-bit fraction at iteration "n". Otherwise, the transition would be awkward to the user, in the sense that the program's behavior would be radically different.

So, for instance, if one of the pseudorandom values is 0x8D903EE7 in the first sequence, then a compatible value would be, say, 0x8D903EE79CA4B29B. If we look at these values as fractions on [0,1), then as you can see, the second fraction is within 1 part in 2^32 of the first, which constitutes a smooth transition within the limits of 32-bit precision.

One could do this using any 2 32-bit pseudorandom sequences. But we want the high and low parts to be uncorrelated, yet rapidly generated without the need to compute 128-bit products. Double use of LMD2_A (0xFE001000) helps in this regard.

LMD3 has the seeds that it has because it caters to the cases wherein one is more concerned with long nonzero sequences than double-bit coverage. But if used in the manner above, then it can also provide a smooth (and fast) transition between 32-bit and 64-bit sequences. Granted, because the multipier is still LMD2_A, the period of the resulting sequence is the same as for LMD2, regardless of whether LMD2 and LMD3 are combined. For practical applications, this probably doesn't matter. This could be avoided by using a different multiplier, for example, 0xF7FBFFFF, which is only slightly slower than LMD2_A.

Following is the definition of LMD3. Note that it behaves exactly like LMD2, except for the seeds:


ITERATE_NO_ZERO_CHECK means:

p = 0xFE001000 * x + c
x <- p MOD 2^32
c <- p / 2^32 (Shift p right by 32.)

(x0, c0) = (0x0, 0xDA6D32BA)

(That's not a typo. "x" should rightly start at 0 in order to maximize the number of subsequent nonzero terms.)

Thus:

(x1, c1) = (0xDA6D32BA, 0x0)
(x2, c2) = (0x5F2BA000, 0xD8B865FB)
(x3, c3) = (0x92B865FB, 0x5E6D4EB3)

Thus:

(LMD3 of null) = 0x38DA816D92B865FB


If what you really want is a pseudorandom sequence with a period roughly the square as long, then LMD3 will not help you. Instead, try combining LMD2 with a sequence using the multiplier 0xF7FBFFFF:


0xF7FBFFFF * x = (x * 2^32) - (x * 2^27) - (x * 2^18) - x


As required, (0xF7FBFFFF * 2^32 - 1) and (0xF7FBFFFF * 2^31 - 1) are both prime. The cycle length (in the absence of LMD2) is the latter, or 8,934,578,708,602,159,103, which is enough 32-bit "x" values to exceed the size of a 64-bit address space. Because LMD2 and this new sequence have prime and unequal cycle lengths, the cycle length of their concatenated "x" values is the product of their cycle lengths, or 81,763,217,765,900,274,931,684,699,996,617,179,137, which is just under 2^126.

And finally, just because I happened to take the time to compute it: if a long nonzero initial sequence is what you want, then try the following with 0xF7FBFFFF:


(x0, c0) = (0, 0x938A52)


It will produce 44,342,898,605 nonzero values, starting with the output of the first iteration.

Using LMD2 to Generate a Long Nonzero Pseudorandom Sequence

Borrowing the multiplier from LMD2, we can rapidly create a long (49,327,206,862 outputs, excluding the initial x0=0) sequence of nonzero 32-bit values by simply altering (x0, c0) as follows, while keeping ITERATE_NO_ZERO_CHECK as described in the previous post:


As usual, ITERATE_NO_ZERO_CHECK means:

p = 0xFE001000 * x + c
x <- p MOD 2^32
c <- p / 2^32 (Shift p right by 32.)

Except that, unlike with LMD2:

(x0, c0) = (0, 0xDA6D32BA)

Thus:

(x1, c1) = (0xDA6D32BA, 0)
(x2, c2) = (0x5F2BA000, 0xD8B865FB)
(x3, c3) = (0x92B865FB, 0x5E6D4EB3)

(Only the "x" values are the pseudorandom outputs.)


Don't forget that 0xFE001000 is easy to multiply by, as described in the previous post.

Someone out there probably has a use for such a sequence. And it's not bad from error detection standpoint, either.

(Blogger's font antics are driving me insane. I've changed to this idiotic font that displays hexadecimal numbers with different number and letter heights. Of course, it's the default. So let's hope it fixes the problem.)

LMD2: Breaking the Megabyte Barrier

Perhaps this sounds like the title to an article in an issue of Byte Magazine from 1988. But I'm not talking about memory capacity. The focus of this post is LMD2, an algorithm closely related to LMD, but with the capacity to offer comparable 2-bit error protection strength over a full megabyte, instead of slightly less than that. This is significant because it permits the use of megabyte block size for storage or transmission protocols, without having to worry about 1-bit or 2-bit errors (except in the negligible case where an error bit in the message compensates for an error bit in the LMD2, which appears to have odds of 1:2^64 in the long message limit).

The LMD2 algo is identical to LMD, with the exception of the seed values and the multiplier. Specifically:


ITERATE_NO_ZERO_CHECK means:

p = 0xFE001000 * x + c
x <- p MOD 2^32
c <- p / 2^32 (Shift p right by 32.)

(x0, c0) = (
0x129E5CFA, 0xC97A34B3)

Here are the first few values:

(x1,c1) = (0xBB49D4B3, 0x1279216A)
(x2, c2) = (0x49C4516A, 0xB9D34CBE)
(x3, c3) = (0x2AE9ECBE, 0x4930CD64)

which implies that:

(LMD2 of null) = 0x12AB02173D8849B8


You might reasonably wonder why we multiply by 0xFE001000 on each every iteration. This value is chosen so that (0xFE001000 * 2^32 - 1) and (0xFE001000 * 2^31 - 1) are both prime. This is the same constraint satisfied by 0x7FFFFDCD in LMD. This requirement comes from the multiply-with-carry generator previously discussed.

The cycle length is thus (0xFE001000 * 2^31 - 1), or 9,151,323,238,909,870,079 -- about double that of LMD. This is just under 2^63, which implies that a complete cycle of 32-bit "x" outputs would exceed the size of a modern 64-bit memory space. The reason LMD was limited to half as much was some doubt on my part as to whether or not it was safe to use a coefficient with MSB 31, as opposed to MSB 30. That doubt has now been removed.

But there's something else which is special about 0xFE001000: it's easier to multiply by, which may accelerate the process on certain microprocessor architectures:


0xFE001000 * x = (x * 2^32) + (x * 2^12) - (x * 2^25)


Apart from the coefficient, LMD2 has a longer unique shiftoid sequence length. Specifically, the first 263,837 x outputs (excluding the seeds themselves) constitute unique shiftoids. Because each "x" is 32 bits, this implies error coverage for up to 1,055,348 bytes -- enough to cover a 1MiB data block plus 6772 bytes of metadata.

The noise quality appears to be similar to LMD.

All of these features suggest that LMD2 should replace LMD in practice.

Sorry again for the incorrigible font problems. Blogger sucks, but it's free.

The Finer Points of Digest Finalization

I want to go into a bit more detail regarding the digest finalization process, reiterated below:


(Finish computing dot product, y, above, leaving (x, c) as its most recent value, such that x has been used to compute the last term of y.)

z = (y + (c * 2^32) + x) MOD 2^64
x <- z MOD 2^32
c <- z / 2^32
ITERATE_NO_ZERO_CHECK
ITERATE_NO_ZERO_CHECK
ITERATE_NO_ZERO_CHECK
z = (z + (c * 2^32) + x) MOD 2^64

z is the (final) digest. Done!


It occurred to me the that the above process appears somewhat arbitrary. It is not.

The first step, in which we add the dot product, y, to the current (x,c), is important because we want the LMD of a null file to be "random". Otherwise, imagine how easy it would be to have a null file with an LMD of 0, which appears to be valid. Because y can be any of 2^64 states, z has the same domain.

We pass through the iterator 3 times because, as explained previously, that seems to be optimal from an entropy accrual standpoint, balanced against the need to avoid excessive latency.

In the last stage, we combine z with (x,c) from the beginning of the process. This is critical in order to ensure that z can assume any of 2^64 states; otherwise, it would be constrained by virtue of having passed through the iterator thrice.