汉明数和双精度 [英] Hamming numbers and double precision
问题描述
我在哈斯克尔(Haskell)中生成汉明数字的过程中,试图改进明显的(请为功能命名)
I was playing around with generating Hamming numbers in Haskell, trying to improve on the obvious (pardon the naming of the functions)
mergeUniq :: Ord a => [a] -> [a] -> [a]
mergeUniq (x:xs) (y:ys) = case x `compare` y of
EQ -> x : mergeUniq xs ys
LT -> x : mergeUniq xs (y:ys)
GT -> y : mergeUniq (x:xs) ys
powers :: [Integer]
powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5
where
expand factor = (factor *) <$> powers
我注意到,如果我将数字表示为2、3和5指数(如data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int }
)的三倍,则我可以避免(较慢的)任意精度Integer
,其中该数字被理解为2k2 * 3k3 * 5k5
.然后,两个Power
的比较变为
I noticed that I can avoid the (slower) arbitrary precision Integer
if I represent the numbers as the triple of the 2-, 3- and 5-exponents like data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int }
, where the number is understood to be 2k2 * 3k3 * 5k5
. The comparison of two Power
s then becomes
instance Ord Power where
p1 `compare` p2 = toComp (p1 `divP` gcdP) `compare` toComp (p2 `divP` gcdP)
where
divP p1 p2 = Power { k2 = k2 p1 - k2 p2, k3 = k3 p1 - k3 p2, k5 = k5 p1 - k5 p2 }
gcdP = Power { k2 = min (k2 p1) (k2 p2), k3 = min (k3 p1) (k3 p2), k5 = min (k5 p1) (k5 p2) }
toComp Power { .. } = fromIntegral k2 * log 2 + fromIntegral k3 * log 3 + fromIntegral k5 * log 5
因此,非常笼统地说,为了比较p₁ = 2i₁ * 3j₁ * 5k₁
和p₂ = 2i₂ * 3j₂ * 5k₂
,我们比较了p₁
和p₂
的对数,这些对数可能适合Double
.但是实际上我们做得更好:我们首先计算它们的GCD(通过找到相应指数对的min
-到目前为止只有Int
个算术!),将p₁
和p₂
除以GCD(减去相应指数的min
s(也只有Int
算术),然后比较结果的对数.
So, very roughly speaking, to compare p₁ = 2i₁ * 3j₁ * 5k₁
and p₂ = 2i₂ * 3j₂ * 5k₂
we compare the logarithms of p₁
and p₂
, which presumably fit Double
. But actually we do even better: we first compute their GCD (by finding the min
s of the corresponding exponents pairs — only Int
arithmetic so far!), divide p₁
and p₂
by the GCD (by subtracting the min
s from the corresponding exponents — also only Int
arithmetic), and compare the logarithms of the results.
但是,考虑到我们要经过Double
,最终会损失精度.这是我提出问题的依据:
But, given that we go through Double
s, there will be loss of precision eventually. And this is the ground for my questions:
-
Double
的有限精度何时会咬我?也就是说,如何估计i, j, k
与2i * 3j * 5k
与具有相似"指数的数字的比较结果将变得不可靠的顺序? - 我们通过除以GCD(可能会大大降低此任务的指数)的事实,如何修改上一个问题的答案?
- When will the finite precision of
Double
s bite me? That is, how to estimate the order ofi, j, k
for which the results of comparisons of2i * 3j * 5k
with numbers with "similar" exponents will become unreliable? - How does the fact that we go through dividing by the GCD (which presumably lowers the exponents considerably for this task) modify the answer to the previous question?
我做了一个实验,将以这种方式产生的数字与通过任意精度算术产生的数字进行比较,直到第1'000'000'000的所有汉明数字都完全匹配(这花了我大约15分钟和600大容量的RAM进行验证).但这显然不是证明.
I did an experiment, comparing the numbers produced this way with the numbers produced via going through arbitrary precision arithmetic, and all Hamming numbers up to the 1'000'000'000th match exactly (which took me about 15 minutes and 600 megs of RAM to verify). But that's obviously not a proof.
推荐答案
凭经验,它超过了约10万亿海明数或更高版本.
Empirically, it's above about 10 trillionths Hamming number, or higher.
使用不错的GCD技巧在这里无济于事,因为某些相邻的汉明数必然在它们之间没有共同的因素.
Using your nice GCD trick won't help us here, because some neighboring Hamming numbers are bound to have no common factors between them.
update: trying it online on ideone and elsewhere, we get
4T 5.81s 22.2MB -- 16 digits used.... still good
-- (as evidenced by the `True` below), but really pushing it.
((True,44531.6794,7.275957614183426e-11),(16348,16503,873),"2.3509E+13405")
-- isTruly max min logval nth-Hamming approx.
-- Sorted logval difference as i,j,k value
-- in band in band in decimal
10T 11.13s 26.4MB
((True,60439.6639,7.275957614183426e-11),(18187,23771,1971),"1.4182E+18194")
13T 14.44s 30.4MB ...still good
((True,65963.6432,5.820766091346741e-11),(28648,21308,1526),"1.0845E+19857")
---- same code on tio:
10T 16.77s
35T 38.84s
((True,91766.4800,5.820766091346741e-11),(13824,2133,32112),"2.9045E+27624")
70T 59.57s
((True,115619.1575,5.820766091346741e-11),(13125,13687,34799),"6.8310E+34804")
---- on home machine:
100T: 368.13s
((True,130216.1408,5.820766091346741e-11),(88324,876,17444),"9.2111E+39198")
140T: 466.69s
((True,145671.6480,5.820766091346741e-11),(9918,24002,42082),"3.4322E+43851")
170T: 383.26s ---FAULTY---
((False,155411.2501,0.0),(77201,27980,14584),"2.80508E+46783")
这篇关于汉明数和双精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!