汉明数和双精度 [英] Hamming numbers and double precision

查看:157
本文介绍了汉明数和双精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在哈斯克尔(Haskell)中生成汉明数字的过程中,试图改进明显的(请为功能命名)

I was playing around with generating Hamming numbers in Haskell, trying to improve on the obvious (pardon the naming of the functions)

mergeUniq :: Ord a => [a] -> [a] -> [a]
mergeUniq (x:xs) (y:ys) = case x `compare` y of
                               EQ -> x : mergeUniq xs ys
                               LT -> x : mergeUniq xs (y:ys)
                               GT -> y : mergeUniq (x:xs) ys

powers :: [Integer]
powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5
  where
    expand factor = (factor *) <$> powers

我注意到,如果我将数字表示为2、3和5指数(如data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int })的三倍,则我可以避免(较慢的)任意精度Integer,其中该数字被理解为2k2 * 3k3 * 5k5.然后,两个Power的比较变为

I noticed that I can avoid the (slower) arbitrary precision Integer if I represent the numbers as the triple of the 2-, 3- and 5-exponents like data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int }, where the number is understood to be 2k2 * 3k3 * 5k5. The comparison of two Powers then becomes

instance Ord Power where
  p1 `compare` p2 = toComp (p1 `divP` gcdP) `compare` toComp (p2 `divP` gcdP)
    where
    divP p1 p2 = Power { k2 = k2 p1 - k2 p2, k3 = k3 p1 - k3 p2, k5 = k5 p1 - k5 p2 }
    gcdP = Power { k2 = min (k2 p1) (k2 p2), k3 = min (k3 p1) (k3 p2), k5 = min (k5 p1) (k5 p2) }
    toComp Power { .. } = fromIntegral k2 * log 2 + fromIntegral k3 * log 3 + fromIntegral k5 * log 5

因此,非常笼统地说,为了比较p₁ = 2i₁ * 3j₁ * 5k₁p₂ = 2i₂ * 3j₂ * 5k₂,我们比较了p₁p₂的对数,这些对数可能适合Double.但是实际上我们做得更好:我们首先计算它们的GCD(通过找到相应指数对的min-到目前为止只有Int个算术!),将p₁p₂除以GCD(减去相应指数的min s(也只有Int算术),然后比较结果的对数.

So, very roughly speaking, to compare p₁ = 2i₁ * 3j₁ * 5k₁ and p₂ = 2i₂ * 3j₂ * 5k₂ we compare the logarithms of p₁ and p₂, which presumably fit Double. But actually we do even better: we first compute their GCD (by finding the mins of the corresponding exponents pairs — only Int arithmetic so far!), divide p₁ and p₂ by the GCD (by subtracting the mins from the corresponding exponents — also only Int arithmetic), and compare the logarithms of the results.

但是,考虑到我们要经过Double,最终会损失精度.这是我提出问题的依据:

But, given that we go through Doubles, there will be loss of precision eventually. And this is the ground for my questions:

  1. Double的有限精度何时会咬我?也就是说,如何估计i, j, k2i * 3j * 5k与具有相似"指数的数字的比较结果将变得不可靠的顺序?
  2. 我们通过除以GCD(可能会大大降低此任务的指数)的事实,如何修改上一个问题的答案?
  1. When will the finite precision of Doubles bite me? That is, how to estimate the order of i, j, k for which the results of comparisons of 2i * 3j * 5k with numbers with "similar" exponents will become unreliable?
  2. How does the fact that we go through dividing by the GCD (which presumably lowers the exponents considerably for this task) modify the answer to the previous question?

我做了一个实验,将以这种方式产生的数字与通过任意精度算术产生的数字进行比较,直到第1'000'000'000的所有汉明数字都完全匹配(这花了我大约15分钟和600大容量的RAM进行验证).但这显然不是证明.

I did an experiment, comparing the numbers produced this way with the numbers produced via going through arbitrary precision arithmetic, and all Hamming numbers up to the 1'000'000'000th match exactly (which took me about 15 minutes and 600 megs of RAM to verify). But that's obviously not a proof.

推荐答案

凭经验,它超过了约1​​0万亿海明数或更高版本.

Empirically, it's above about 10 trillionths Hamming number, or higher.

使用不错的GCD技巧在这里无济于事,因为某些相邻的汉明数必然在它们之间没有共同的因素.

Using your nice GCD trick won't help us here, because some neighboring Hamming numbers are bound to have no common factors between them.

更新:在ideone上在线尝试 ,我们得到了

update: trying it online on ideone and elsewhere, we get

4T  5.81s 22.2MB  -- 16 digits used.... still good
                  --  (as evidenced by the `True` below), but really pushing it.
((True,44531.6794,7.275957614183426e-11),(16348,16503,873),"2.3509E+13405")
-- isTruly  max        min logval           nth-Hamming       approx.
--  Sorted   logval      difference          as i,j,k          value
--            in band      in band                             in decimal
10T   11.13s 26.4MB
((True,60439.6639,7.275957614183426e-11),(18187,23771,1971),"1.4182E+18194")
13T   14.44s 30.4MB    ...still good
((True,65963.6432,5.820766091346741e-11),(28648,21308,1526),"1.0845E+19857")

---- same code on tio:
10T   16.77s
35T   38.84s 
((True,91766.4800,5.820766091346741e-11),(13824,2133,32112),"2.9045E+27624")
70T   59.57s
((True,115619.1575,5.820766091346741e-11),(13125,13687,34799),"6.8310E+34804")

---- on home machine:
100T: 368.13s
((True,130216.1408,5.820766091346741e-11),(88324,876,17444),"9.2111E+39198")

140T: 466.69s
((True,145671.6480,5.820766091346741e-11),(9918,24002,42082),"3.4322E+43851")

170T: 383.26s         ---FAULTY---
((False,155411.2501,0.0),(77201,27980,14584),"2.80508E+46783")

这篇关于汉明数和双精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆