校验大量的质数？（用于验证） [英] Checksumming large swathes of prime numbers? (for verification)

查看：86 发布时间：2020/10/1 5:50:49 primes checksum discrete-mathematics

本文介绍了校验大量的质数？（用于验证）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否存在用于计算数百万或数十亿质数的高质量校验和的聪明算法？即

Are there any clever algorithms for computing high-quality checksums on millions or billions of prime numbers? I.e. with maximum error-detection capability and perhaps segmentable?

动机：

小的素数-最多64位大小-通过使用小的位图筛选潜在因子（最大2 ^ 32-1）和使用第二个位图筛选目标范围内的数字，可以按需筛选每秒数百万的速度。

Small primes - up to 64 bits in size - can be sieved on demand to the tune of millions per second, by using a small bitmap for sieving potential factors (up to 2^32-1) and a second bitmap for sieving the numbers in the target range.

算法和实现是相当简单和直接的，但是细节在于魔鬼：值趋向于-或超越-遍及内建整数类型的极限，边界情况如果编程没有适当的防御性，那么很多（可以这么说）甚至浮点严格性上的差异都可能导致破坏。更不用说优化的编译器可能会产生的混乱，即使是在静态库中的已编译，已测试的代码上（如果使用了链接时代码生成）。更不用说更快的算法往往会变得更加复杂，从而变得更加脆弱。

Algorithm and implementation are reasonably simple and straightforward but the devil is in the details: values tend to push against - or exceed - the limits of builtin integral types everywhere, boundary cases abound (so to speak) and even differences in floating point strictness can cause breakage if programming is not suitably defensive. Not to mention the mayhem that an optimising compiler can wreak, even on already-compiled, already-tested code in a static lib (if link-time code generation is used). Not to mention that faster algorithms tend to be a lot more complicated and thus even more brittle.

这有两个结果：测试结果基本上是没有意义的，除非使用

This has two consequences: test results are basically meaningless unless the tests are performed using the final executable image, and it becomes highly desirable to verify proper operation at runtime, during normal use.

检查预先计算的值将提供最高的置信度，但在正常使用过程中，非常需要在运行时验证运行正常。所需的文件又大又笨重。具有1000万个素数的文本文件的未压缩量约为100 MB，而压缩量则超过10 MB。存储字节编码的差异时，每个素数需要一个字节，并且熵编码最多可以将大小减小一半（对于1000万个素数，则为5 MB）。因此，即使仅覆盖2 ^ 32的小因素的文件也将占用大约100 MB的空间，并且解码器的复杂性将超过窗口筛本身的复杂性。

Checking against pre-computed values would give the highest degree of confidence but the required files are big and clunky. A text file with 10 million primes has on the order of 100 MB uncompressed and more than 10 MB compressed; storing byte-encoded differences requires one byte per prime and entropy coding can at best reduce the size to half (5 MB for 10 million primes). Hence even a file that covers only the small factors up to 2^32 would weigh in at about 100 MB, and the complexity of the decoder would exceed that of the windowed sieve itself.

这意味着对文件的检查是不可行的，除非作为对新建可执行文件的最终发行检查。更不用说值得信赖的文件不容易获得。主要页面提供了前5000万个素数的文件，甚至还有惊人的 primos.mat.br 最多只能达到1,000,000,000,000。这是不幸的，因为许多边界情况（==需要测试）发生在2 ^ 62和2 ^ 64-1之间。

This means that checking against files is not feasible except as a final release check for a newly-built executable. Not to mention that the trustworthy files are not easy to come by. The Prime Pages offer files for the first 50 million primes, and even the amazing primos.mat.br goes only up to 1,000,000,000,000. This is unfortunate since many of the boundary cases (== need for testing) occur between 2^62 and 2^64-1.

这就剩下校验和了。这样，空间需求将是有限的，并且仅与测试用例的数量成比例。我不想要求像MD5或SHA-256这样的体面校验和可用，并且目标数字都是素数时，应该可以对数字进行一些简单的操作来生成高质量，高分辨率的校验和

This leaves checksumming. That way the space requirements would be marginal, and only proportional to the number of test cases. I don't want to require that a decent checksum like MD5 or SHA-256 be available, and with the target numbers all being prime it should be possible to generate a high-quality, high-resolution checksum with some simple ops on the numbers themselves.

这是我到目前为止提出的。原始摘要由四个64位数字组成。最后，可以将其折叠成所需的尺寸。

This is what I've come up with so far. The raw digest consists of four 64-bit numbers; at the end it can be folded down to the desired size.

   for (unsigned i = 0; i < ELEMENTS(primes); ++i)
   {
      digest[0] *= primes[i];              // running product (must be initialised to 1)
      digest[1] += digest[0];              // sum of sequence of running products
      digest[2] += primes[i];              // running sum
      digest[3] += digest[2] * primes[i];  // Hornerish sum
   }

每个素数有两个（非相关）muls是足够体面的，除了简单的总和之外，每个组件始终都能发现我试图溜过摘要的所有错误。但是，我不是数学家，经验测试也不是功效的保证。

At two (non-dependent) muls per prime the speed is decent enough, and except for the simple sum each of the components has always uncovered all errors I tried to sneak past the digest. However, I'm not a mathematician, and empirical testing is not a guarantee of efficacy.

是否存在一些可用于设计的数学属性-而不是'cook就像我一样-明智，可靠的校验和？

Are there some mathematical properties that can be exploited to design - rather than 'cook' as I did - a sensible, reliable checksum?

在某种意义上说，可以以可逐步调整的方式设计校验和，即可以分别处理子范围，然后将结果与一些算术结合起来给出的结果与一次检查整个范围的结果相同吗？为了实现并行处理，如今与所有高级CRC实现相同的东西。

Is it possible to design the checksum in a way that makes it steppable, in the sense that subranges can be processed separately and then the results combined with a bit of arithmetic to give the same result as if the whole range had been checksummed in one go? Same thing as all advanced CRC implementations tend to have nowadays, to enable parallel processing.

编辑当前方案的基本原理是：计数，总和和乘积不取决于将素数添加到摘要中的顺序；它们可以在单独的块上计算然后组合。校验和确实取决于顺序。这就是存在的理由。但是，如果可以以某种方式组合两个连续块的两个校验和以给出组合块的校验和，那就太好了。

EDIT The rationale for the current scheme is this: the count, the sum and the product do not depend on the order in which primes are added to the digest; they can be computed on separate blocks and then combined. The checksum does depend on the order; that's its raison d'être. However, it would be nice if the two checksums of two consecutive blocks could be combined somehow to give the checksum of the combined block.

有时可以通过外部来源来验证计数和总和，例如 oeis.org ，或针对 primos.mat.br （索引给出第一个和最后一个素数，隐含数字== 1000万）。但是，产品和校验和没有运气。

The count and the sum can sometimes be verified against external sources, like certain sequences on oeis.org, or against sources like the batches of 10 million primes at primos.mat.br (the index gives first and last prime, the number == 10 million is implied). No such luck for product and checksum, though.

在我花费大量时间和计算能力进行摘要的计算和验证时，这些摘要涵盖了直到2 ^ 64的所有小因素，我想听听专家考虑一下...

Before I throw major time and computing horsepower at the computation and verification of digests covering the whole range of small factors up to 2^64 I'd like to hear what the experts think about this...

我目前正在32位和64位变体中进行测试的方案如下：

The scheme I'm currently test-driving in 32-bit and 64-bit variants looks like this:

template<typename word_t>
struct digest_t
{
   word_t count;
   word_t sum;
   word_t product;
   word_t checksum;

   // ...

   void add_prime (word_t n)
   {
      count    += 1;
      sum      += n;
      product  *= n;
      checksum += n * sum + product;
   }
};

其优点是32位摘要分量等于相应64位摘要的下半部分位值，意味着即使需要快速的32位验证，也只需要计算存储64位摘要。您可以在此简单的筛测试程序 @ pastebin中找到摘要的32位版本，以进行动手实验。修订后的模板版本中的完整Monty可以在一种适用于2 ^ 64-1 <的筛子的较新粘贴中找到。 / a>。

This has the advantage that the 32-bit digest components are equal to the lower halves of the corresponding 64-bit values, meaning only 64-bit digests need to be computed stored even if fast 32-bit verification is desired. A 32-bit version of the digest can be found in this simple sieve test program @ pastebin, for hands-on experimentation. The full Monty in a revised, templated version can be found in a newer paste for a sieve that works up to 2^64-1.

校验大量的质数？（用于验证） [英] Checksumming large swathes of prime numbers? (for verification)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

校验大量的质数？ （用于验证） [英] Checksumming large swathes of prime numbers? (for verification)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

校验大量的质数？（用于验证） [英] Checksumming large swathes of prime numbers? (for verification)

登录关闭