无限生成汉明序列的最新技术 [英] New state of the art in unlimited generation of Hamming sequence

查看:30
本文介绍了无限生成汉明序列的最新技术的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(这太令人兴奋了!)我知道,这个主题是众所周知的.用于有效生成无限递增的汉明数序列,没有重复和遗漏的最新技术(在 Haskell 以及其他语言中)长期以来一直如下(AFAIK - 顺便说一句,它相当于 原始 Edsger Dijkstra 代码):

(this is exciting!) I know, the subject matter is well known. The state of the art (in Haskell as well as other languages) for efficient generation of unbounded increasing sequence of Hamming numbers, without duplicates and without omissions, has long been the following (AFAIK - and btw it is equivalent to the original Edsger Dijkstra's code too):

hamm :: [Integer]
hamm = 1 : map (2*) hamm `union` map (3*) hamm `union` map (5*) hamm
  where
    union a@(x:xs) b@(y:ys) = case compare x y of
        LT -> x : union  xs  b
        EQ -> x : union  xs  ys
        GT -> y : union  a   ys

我要问的问题是,你能找到提高效率的方法吗?它是否仍然是最先进的技术,或者实际上是否可以改进它以两倍速度和更好的经验增长顺序要启动?

The question I'm asking is, can you find the way to make it more efficient in any significant measure? Is it still the state of the art or is it in fact possible to improve this to run twice faster and with better empirical orders of growth to boot?

如果您的回答是,请展示代码并讨论其与上述相比的速度和经验增长顺序(它运行在大约 ~ n^1.05 .. n^1.10 产生的前几十万个数字).另外,如果它存在,这个有效的算法是否可以扩展到产生具有任何给定素数集的平滑数序列?

If your answer is yes, please show the code and discuss its speed and empirical orders of growth in comparison to the above (it runs at about ~ n^1.05 .. n^1.10 for first few hundreds of thousands of numbers produced). Also, if it exists, can this efficient algorithm be extended to producing a sequence of smooth numbers with any given set of primes?

推荐答案

如果一个常数因子(1) 加速很重要,那么我可以提供一个效率更高的版本:

If a constant factor(1) speedup counts as significant, then I can offer a significantly more efficient version:

hamm :: [Integer]
hamm = mrg1 hamm3 (map (2*) hamm)
  where
    hamm5 = iterate (5*) 1
    hamm3 = mrg1 hamm5 (map (3*) hamm3)
    merge a@(x:xs) b@(y:ys)
        | x < y     = x : merge xs b
        | otherwise = y : merge a ys
    mrg1 (x:xs) ys = x : merge xs ys

您可以轻松地将其推广到给定一组素数的平滑数字:

You can easily generalise it to smooth numbers for a given set of primes:

hamm :: [Integer] -> [Integer]
hamm [] = [1]
hamm [p] = iterate (p*) 1
hamm ps = foldl' next (iterate (q*) 1) qs
  where
    (q:qs) = sortBy (flip compare) ps
    next prev m = let res = mrg1 prev (map (m*) res) in res
    merge a@(x:xs) b@(y:ys)
        | x < y     = x : merge xs b
        | otherwise = y : merge a ys
    mrg1 (x:xs) ys = x : merge xs ys

它更高效,因为该算法不会产生任何重复项并且使用更少的内存.在您的版本中,当产生 h 附近的汉明数时,h/5h 之间的列表部分必须在内存中.在我的版本中,只有完整列表的 h/2h 之间的部分,以及 h/3 之间的部分3-5-list 的 h 需要在内存中.由于 3-5-list 更稀疏,并且 k-smooth 数的密度降低,这两个列表部分需要的内存比完整列表的较大部分少得多.

It's more efficient because that algorithm doesn't produce any duplicates and it uses less memory. In your version, when a Hamming number near h is produced, the part of the list between h/5 and h has to be in memory. In my version, only the part between h/2 and h of the full list, and the part between h/3 and h of the 3-5-list needs to be in memory. Since the 3-5-list is much sparser, and the density of k-smooth numbers decreases, those two list parts need much less memory that the larger part of the full list.

两种算法产生kth汉明数的一些时间,每个目标相对于前一个目标的经验复杂度,不包括和包括GC时间:

Some timings for the two algorithms to produce the kth Hamming number, with empirical complexity of each target relative to the previous, excluding and including GC time:

  k            Yours (MUT/GC)               Mine (MUT/GC)
 10^5           0.03/0.01                    0.01/0.01      -- too short to say much, really
2*10^5          0.07/0.02                    0.02/0.01
5*10^5          0.17/0.06  0.968  1.024      0.06/0.04      1.199    1.314
 10^6           0.36/0.13  1.082  1.091      0.11/0.10      0.874    1.070
2*10^6          0.77/0.27  1.097  1.086      0.21/0.21      0.933    1.000
5*10^6          1.96/0.71  1.020  1.029      0.55/0.59      1.051    1.090
 10^7           4.05/1.45  1.047  1.043      1.14/1.25      1.052    1.068
2*10^7          8.73/2.99  1.108  1.091      2.31/2.65      1.019    1.053
5*10^7         21.53/7.83  0.985  1.002      6.01/7.05      1.044    1.057
 10^8          45.83/16.79 1.090  1.093     12.42/15.26     1.047    1.084

可以看到,MUT 次数之间的因数约为 3.5,但 GC 时间相差不大.

As you can see, the factor between the MUT times is about 3.5, but the GC time is not much different.

(1) 好吧,它看起来是恒定的,而且我认为这两种变体具有相同的计算复杂度,但是我没有拿出铅笔和纸来证明它,也不打算这样做.

(1) Well, it looks constant, and I think both variants have the same computational complexity, but I haven't pulled out pencil and paper to prove it, nor do I intend to.

这篇关于无限生成汉明序列的最新技术的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆