校验大量的质数? (用于验证) [英] Checksumming large swathes of prime numbers? (for verification)

查看:86
本文介绍了校验大量的质数? (用于验证)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在用于计算数百万或数十亿质数的高质量校验和的聪明算法?即

Are there any clever algorithms for computing high-quality checksums on millions or billions of prime numbers? I.e. with maximum error-detection capability and perhaps segmentable?

动机:

小的素数-最多64位大小-通过使用小的位图筛选潜在因子(最大2 ^ 32-1)和使用第二个位图筛选目标范围内的数字,可以按需筛选每秒数百万的速度。

Small primes - up to 64 bits in size - can be sieved on demand to the tune of millions per second, by using a small bitmap for sieving potential factors (up to 2^32-1) and a second bitmap for sieving the numbers in the target range.

算法和实现是相当简单和直接的,但是细节在于魔鬼:值趋向于-或超越-遍及内建整数类型的极限,边界情况如果编程没有适当的防御性,那么很多(可以这么说)甚至浮点严格性上的差异都可能导致破坏。更不用说优化的编译器可能会产生的混乱,即使是在静态库中的已编译,已测试的代码上(如果使用了链接时代码生成)。更不用说更快的算法往往会变得更加复杂,从而变得更加脆弱。

Algorithm and implementation are reasonably simple and straightforward but the devil is in the details: values tend to push against - or exceed - the limits of builtin integral types everywhere, boundary cases abound (so to speak) and even differences in floating point strictness can cause breakage if programming is not suitably defensive. Not to mention the mayhem that an optimising compiler can wreak, even on already-compiled, already-tested code in a static lib (if link-time code generation is used). Not to mention that faster algorithms tend to be a lot more complicated and thus even more brittle.

这有两个结果:测试结果基本上是没有意义的,除非使用

This has two consequences: test results are basically meaningless unless the tests are performed using the final executable image, and it becomes highly desirable to verify proper operation at runtime, during normal use.

检查预先计算的值将提供最高的置信度,但在正常使用过程中,非常需要在运行时验证运行正常。所需的文件又大又笨重。具有1000万个素数的文本文件的未压缩量约为100 MB,而压缩量则超过10 MB。存储字节编码的差异时,每个素数需要一个字节,并且熵编码最多可以将大小减小一半(对于1000万个素数,则为5 MB)。因此,即使仅覆盖2 ^ 32的小因素的文件也将占用大约100 MB的空间,并且解码器的复杂性将超过窗口筛本身的复杂性。

Checking against pre-computed values would give the highest degree of confidence but the required files are big and clunky. A text file with 10 million primes has on the order of 100 MB uncompressed and more than 10 MB compressed; storing byte-encoded differences requires one byte per prime and entropy coding can at best reduce the size to half (5 MB for 10 million primes). Hence even a file that covers only the small factors up to 2^32 would weigh in at about 100 MB, and the complexity of the decoder would exceed that of the windowed sieve itself.

这意味着对文件的检查是不可行的,除非作为对新建可执行文件的最终发行检查。更不用说值得信赖的文件不容易获得。 主要页面提供了前5000万个素数的文件,甚至还有惊人的 primos.mat.br 最多只能达到1,000,000,000,000。这是不幸的,因为许多边界情况(==需要测试)发生在2 ^ 62和2 ^ 64-1之间。

This means that checking against files is not feasible except as a final release check for a newly-built executable. Not to mention that the trustworthy files are not easy to come by. The Prime Pages offer files for the first 50 million primes, and even the amazing primos.mat.br goes only up to 1,000,000,000,000. This is unfortunate since many of the boundary cases (== need for testing) occur between 2^62 and 2^64-1.

这就剩下校验和了。这样,空间需求将是有限的,并且仅与测试用例的数量成比例。我不想要求像MD5或SHA-256这样的体面校验和可用,并且目标数字都是素数时,应该可以对数字进行一些简单的操作来生成高质量,高分辨率的校验和

This leaves checksumming. That way the space requirements would be marginal, and only proportional to the number of test cases. I don't want to require that a decent checksum like MD5 or SHA-256 be available, and with the target numbers all being prime it should be possible to generate a high-quality, high-resolution checksum with some simple ops on the numbers themselves.

这是我到目前为止提出的。原始摘要由四个64位数字组成。最后,可以将其折叠成所需的尺寸。

This is what I've come up with so far. The raw digest consists of four 64-bit numbers; at the end it can be folded down to the desired size.

   for (unsigned i = 0; i < ELEMENTS(primes); ++i)
   {
      digest[0] *= primes[i];              // running product (must be initialised to 1)
      digest[1] += digest[0];              // sum of sequence of running products
      digest[2] += primes[i];              // running sum
      digest[3] += digest[2] * primes[i];  // Hornerish sum
   }

每个素数有两个(非相关)muls是足够体面的,除了简单的总和之外,每个组件始终都能发现我试图溜过摘要的所有错误。但是,我不是数学家,经验测试也不是功效的保证。

At two (non-dependent) muls per prime the speed is decent enough, and except for the simple sum each of the components has always uncovered all errors I tried to sneak past the digest. However, I'm not a mathematician, and empirical testing is not a guarantee of efficacy.

是否存在一些可用于设计的数学属性-而不是'cook就像我一样-明智,可靠的校验和?

Are there some mathematical properties that can be exploited to design - rather than 'cook' as I did - a sensible, reliable checksum?

在某种意义上说,可以以可逐步调整的方式设计校验和,即可以分别处理子范围,然后将结果与一些算术结合起来给出的结果与一次检查整个范围的结果相同吗?为了实现并行处理,如今与所有高级CRC实现相同的东西。

Is it possible to design the checksum in a way that makes it steppable, in the sense that subranges can be processed separately and then the results combined with a bit of arithmetic to give the same result as if the whole range had been checksummed in one go? Same thing as all advanced CRC implementations tend to have nowadays, to enable parallel processing.

编辑当前方案的基本原理是:计数,总和和乘积不取决于将素数添加到摘要中的顺序;它们可以在单独的块上计算然后组合。校验和确实取决于顺序。这就是存在的理由。但是,如果可以以某种方式组合两个连续块的两个校验和以给出组合块的校验和,那就太好了。

EDIT The rationale for the current scheme is this: the count, the sum and the product do not depend on the order in which primes are added to the digest; they can be computed on separate blocks and then combined. The checksum does depend on the order; that's its raison d'être. However, it would be nice if the two checksums of two consecutive blocks could be combined somehow to give the checksum of the combined block.

有时可以通过外部来源来验证计数和总和,例如 oeis.org ,或针对 primos.mat.br (索引给出第一个和最后一个素数,隐含数字== 1000万)。但是,产品和校验和没有运气。

The count and the sum can sometimes be verified against external sources, like certain sequences on oeis.org, or against sources like the batches of 10 million primes at primos.mat.br (the index gives first and last prime, the number == 10 million is implied). No such luck for product and checksum, though.

在我花费大量时间和计算能力进行摘要的计算和验证时,这些摘要涵盖了直到2 ^ 64的所有小因素,我想听听专家考虑一下...

Before I throw major time and computing horsepower at the computation and verification of digests covering the whole range of small factors up to 2^64 I'd like to hear what the experts think about this...

我目前正在32位和64位变体中进行测试的方案如下:

The scheme I'm currently test-driving in 32-bit and 64-bit variants looks like this:

template<typename word_t>
struct digest_t
{
   word_t count;
   word_t sum;
   word_t product;
   word_t checksum;

   // ...

   void add_prime (word_t n)
   {
      count    += 1;
      sum      += n;
      product  *= n;
      checksum += n * sum + product;
   }
};

其优点是32位摘要分量等于相应64位摘要的下半部分位值,意味着即使需要快速的32位验证,也只需要计算存储64位摘要。您可以在此简单的筛测试程序 @ pastebin中找到摘要的32位版本,以进行动手实验。修订后的模板版本中的完整Monty可以在一种适用于2 ^ 64-1 <的筛子的较新粘贴中找到。 / a>。

This has the advantage that the 32-bit digest components are equal to the lower halves of the corresponding 64-bit values, meaning only 64-bit digests need to be computed stored even if fast 32-bit verification is desired. A 32-bit version of the digest can be found in this simple sieve test program @ pastebin, for hands-on experimentation. The full Monty in a revised, templated version can be found in a newer paste for a sieve that works up to 2^64-1.

推荐答案

我在 Cell 体系结构。感觉类似。

I've done a good bit of work parallelizing operations on Cell architectures. This has a similar feel.

在这种情况下,我将使用快速且可能是增量的哈希函数(例如 xx哈希 MurmurHash3 )和哈希列表(这是默克尔树)。

In this case, I would use a hash function that's fast and possibly incremental (e.g. xxHash or MurmurHash3) and a hash list (which is a less flexible specialization of a Merkle Tree).

这些哈希值非常快。通过一些简单的操作很难变得更好。哈希列表提供了并行性-列表的不同块可以由不同的线程处理,然后对哈希进行哈希处理。您也可以使用Merkle树,但我怀疑这样做会更复杂,却没有太多好处。

These hashes are extremely fast. It's going to be surprisingly hard to get better with some simple set of operations. The hash list affords parallelism -- different blocks of the list can be handled by different threads, and then you hash the hashes. You could also use a Merkle Tree, but I suspect that'd just be more complex without much benefit.


  • 将范围虚拟地划分为对齐的区块-我们称这些微区块。 (例如,微区块的范围为[n << 15,(n + 1)<< 15))

  • 要处理微区块,请计算您需要计算的内容,将其添加到缓冲区,对缓冲区进行哈希处理。 (增量哈希函数将提供较小的缓冲区。不必每次都用相同长度的数据填充缓冲区。)

  • 每个微块哈希将放置在< a href = http://en.wikipedia.org/wiki/Circular_buffer rel = nofollow>圆形缓冲区。

  • 将圆形缓冲区划分为可散列的块( 宏块)。当这些宏块可用时,或者如果没有更多的微块,以适当的顺序递增地对它们进行哈希处理。

  • 生成的哈希值就是您想要的哈希值。

  • Virtually divide your range into aligned blocks -- we'll call these microblocks. (e.g. a microblock is a range such as [n<<15, (n+1)<<15) )
  • To handle a microblock, compute what you need to compute, add it to a buffer, hash the buffer. (An incremental hash function will afford a smaller buffer. The buffer doesn't have to be filled with the same length of data every time.)
  • Each microblock hash will be placed in a circular buffer.
  • Divide the circular buffer into hashable blocks ("macroblocks"). Incrementally hash these macroblocks in the proper order as they become available or if there's no more microblocks left.
  • The resulting hash is the one you want.

一些附加说明:


  • 我建议设计一种线程保留一系列待处理的设计循环缓冲区有空间的微块,对其进行处理,将值转储到循环缓冲区中,然后重复。

  • 这还有一个好处,您可以决定要使用多少个线程。在飞行中。例如当请求一个新范围的微块时,每个线程都可以检测到是否有太多/小线程正在运行并进行调整。

  • 我个人会让线程将最后一个微块哈希值添加到宏块中进行清理该宏块。较少的参数可以进行这种调整。

  • 维护循环缓冲区并不像听起来那么难-仍未处理的最低阶宏块定义了循环缓冲区宏块空间的哪一部分代表。您需要的只是一个简单的计数器,该计数器在适当时表示此值即可递增。

  • 另一个好处是,由于线程会定期执行预留/工作/预留/工作周期,因此,

  • 如果您想使某些东西不那么健壮但更容易,那么您可以放弃很多工作,使用条纹模式-确定最大线程数(N),并让每个线程处理每个第N个微块(由其线程 ID偏移),然后对每个线程散列结果宏块。然后,最后,从N个线程哈希宏块哈希。如果线程数少于N,则可以将工作分配给所需的线程数。 (例如64个最大线程,但三个实线程,线程0处理21个虚拟线程,线程1处理21个虚拟线程,线程2处理22个虚拟线程-不理想,但并不可怕)这本质上是一棵浅默克尔树,而不是哈希列表。

  • I recommend a design where threads reserve a range of pending microblocks that the circular buffer has space for, process them, dump the values in the circular buffer, and repeat.
  • This has the added benefit that you can decide how many threads you want to use on the fly. e.g. when requesting a new range of microblocks, each thread could detect if there's too many/little threads running and adjust.
  • I personally would have the thread adding the last microblock hash to a macroblock clean up that macroblock. Less parameters to tune this way.
  • Maintaining a circular buffer isn't as hard as it sounds -- the lowest order macroblock still unhandled defines what portion of the "macroblock space" the circular buffer represents. All you need is a simple counter that increments when appropriate to express this.
  • Another benefit is that since the threads go through a reserve/work/reserve/work cycle on a regular basis, a thread that is unexpectedly slow won't hinder the running time nearly as badly.
  • If you're looking to make something less robust but easier, you could forgo a good bit of the work by using a "striped" pattern -- decide on the max number of threads (N), and have each thread handle every N-th microblock (offset by its thread "ID") and hash the resulting macroblocks per thread instead. Then at the end, hash the macroblock hashes from the N threads. If you have less than N threads, you can divide the work up amongst the number of threads you do want. (e.g. 64 max threads, but three real threads, thread 0 handles 21 virtual threads, thread 1 handles 21 virtual threads, and thread 2 handles 22 virtual threads -- not ideal, but not terrible) This is essentially a shallow Merkel tree instead of a hash list.

这篇关于校验大量的质数? (用于验证)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆