将数字的各个数字分量相加/求和的最快方法 [英] Fastest method for adding/summing the individual digit components of a number

查看:25
本文介绍了将数字的各个数字分量相加/求和的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不久前,我在数学论坛上看到一个问题,有人在讨论一遍又一遍地将数字中的数字相加,直到得到一个数字.(即362"将变为3+6+2",然后将变为11"......然后11"将变为1+1"将变为2",因此362"将返回 2......我写了一些很好的代码来得到这个答案并发布它只是被一个用户超越,他建议模9中的任何数字都等于这个无限数字和",我检查了它,他是对的......好吧几乎正确,如果返回零,您必须用9"将其切换出来,但这是一个非常快速的解决方案......

I saw a question on a math forum a while back where a person was discussing adding up the digits in a number over and over again until a single digit is achieved. (i.e. "362" would become "3+6+2" which would become "11"... then "11" would become "1+1" would would become "2" therefor "362" would return 2... I wrote some nice code to get an answer to this and posted it only to be outdone by a user who suggested that any number in modulo 9 is equal to this "infinite digit sum", I checked it an he was right... well almost right, if zero was returned you had to switch it out with a "9" but that was a very quick fix...

362 = 3+6+2 = 11 = 1+1 = 2

362 = 3+6+2 = 11 = 1+1 = 2

或者...

362%9 = 2

Anways,mod9 方法非常适合无限添加数字的总和,直到你只剩下一个数字......但是只做一次呢(即 362 只会返回11")......谁能想到快速算法?

Anways, the mod9 method works fantastic for infinitely adding the sum of the digits until you are left with just a single digit... but what about only doing it once (i.e. 362 would just return "11")... Can anyone think of fast algorithms?

推荐答案

有一个很酷的技巧可以将二进制中的 1 位数字相加,并使用一个固定宽度的整数.在每次迭代中,您将每个数字的一​​半分成两个值,将一个值向下移位,然后相加.第一次迭代,将其他数字分开.第二次迭代,数字对,等等.

There's a cool trick for summing the 1 digits in binary, and with a fixed-width integer. At each iteration, you separate out half the digits each into two values, bit-shift one value down, then add. First iteration, separate ever other digit. Second iteration, pairs of digits, and so on.

假设 27 是 00011011 作为 8 位二进制,则过程是......

Given that 27 is 00011011 as 8-bit binary, the process is...

00010001 + 00000101 = 00010110  <-  every other digit step
00010010 + 00000001 = 00010011  <-  pairs of digits
00000011 + 00000001 = 00000100  <-  quads, giving final result 4

你可以用小数做一个类似的技巧,但它会比一个简单的循环效率低,除非你有一个十进制数字的直接表示,并通过快速操作将选定的数字归零并进行数字移位.所以对于 12345678 你得到...

You could do a similar trick with decimal, but it would be less efficient than a simple loop unless you had a direct representation of decimal numbers with fast operations to zero out selected digits and to do digit-shifting. So for 12345678 you get...

02040608 + 01030507 = 03071115  <-  every other digit
00070015 + 00030011 = 00100026  <-  pairs
00000026 + 00000010 = 00000036  <-  quads, final result

所以 1+2+3+4+5+6+7+8 = 36,这是正确的,但只有当你的数字表示是固定宽度的十进制时,你才能有效地做到这一点.它总是需要 lg(n) 次迭代,其中 lg 表示以 2 为底的对数,然后向上取整.

So 1+2+3+4+5+6+7+8 = 36, which is correct, but you can only do this efficiently if your number representation is fixed-width decimal. It always takes lg(n) iterations, where lg means the base two logarithm, and you round upwards.

为了稍微扩展一下(基于评论中的讨论),让我们假装这有点理智......

To expand on this a little (based on in-comments discussions), let's pretend this was sane, for a bit...

如果你计算个位数的加法,这里实际上更多工作比一个简单的循环.与计算位的按位技巧一样,这个想法是重新排序这些加法(使用关联性),然后并行计算尽可能多的计算,使用单个全角加法来实现两个半角加法,四个四分之一宽度加法等.数字清除和数字移位操作有很大的开销,如果将其实现为循环(计算或查找每个步骤的数字掩码和移位距离值),则开销更大.循环"可能应该完全展开,并且那些掩码和移位距离作为常量包含在代码中以避免这种情况.

If you count single-digit additions, there's actually more work than a simple loop here. The idea, as with the bitwise trick for counting bits, is to re-order those additions (using associativity) and then to compute as many as possible in parallel, using a single full-width addition to implement two half-width additions, four quarter-width additions etc. There's significant overhead for the digit-clearing and digit-shifting operations, and even more if you implement this as a loop (calculating or looking up the digit-masking and shift-distance values for each step). The "loop" should probably be fully unrolled and those masks and shift-distances be included as constants in the code to avoid that.

支持 二进制编码十进制 (BCD) 的处理器可以处理此问题.数字掩码和数字移位将使用位掩码和位移来实现,因为每个十进制数字将被编码为 4(或更多)位,独立于其他数字的编码.

A processor with support for Binary Coded Decimal (BCD) could handle this. Digit masking and digit shifting would be implemented using bit masking and bit shifting, as each decimal digit would be encoded in 4 (or more) bits, independent of the encoding of other digits.

一个问题是现在 BCD 支持非常少见.它曾经在 8 位和 16 位时代相当普遍,但据我所知,现在仍然支持它的处理器主要是为了向后兼容.原因包括...

One issue is that BCD support is quite rare these days. It used to be fairly common in the 8 bit and 16 bit days, but as far as I'm aware, processors that still support it now do so mainly for backward compatibility. Reasons include...

  1. 早期的处理器不包括硬件乘法和除法.对这些操作的硬件支持意味着现在将二进制转换为十进制更加容易和高效.现在几乎所有东西都使用二进制,而 BCD 几乎被遗忘了.

  1. Very early processors didn't include hardware multiplication and division. Hardware support for these operations means it's easier and more efficient to convert binary to decimal now. Binary is used for almost everything now, and BCD is mostly forgotten.

库中有十进制数字表示,但很少有高级语言曾经为硬件 BCD 提供可移植支持,因此,由于汇编程序不再是大多数开发人员的现实选择,BCD 支持就不再被使用.

There are decimal number representations around in libraries, but few if any high level languages ever provided portable support to hardware BCD, so since assembler stopped being a real-world option for most developers BCD support simply stopped being used.

随着数字变得越来越大,即使是压缩的 BCD 也是非常低效的压缩.以 10^x 为底的数字表示具有以 10 为底的最重要的属性,并且很容易被解码为十进制.Base 1000 只需要每三位数字 10 位,而不是 12 位,因为 2^10 是 1024.这足以表明您获得了 32 位的额外十进制数字 - 9 位而不是 8 位 - 而且您还剩下 2 位,例如一个符号位.

As numbers get larger, even packed BCD is quite inefficiently packed. Number representations base 10^x have the most important properties of base 10, and are easily decoded as decimal. Base 1000 only needs 10 bits per three digits, not 12, because 2^10 is 1024. That's enough to show you get an extra decimal digit for 32 bits - 9 digits instead of 8 - and you've still got 2 bits left over, e.g. for a sign bit.

问题是,要使这种数字总计算法完全值得,您需要使用可能至少 32 位(8 位)的固定宽度小数.这为(完全展开的)简单循环提供了 12 次操作(6 个掩码、3 个班次、3 个加法)而不是 15 次加法.不过,这是一个临界增益 - 代码中的其他问题很容易意味着它实际上更慢.

The thing is, for this digit-totalling algorithm to be worthwhile at all, you need to be working with fixed-width decimal of probably at least 32 bits (8 digits). That gives 12 operations (6 masks, 3 shifts, 3 additions) rather than 15 additions for the (fully unrolled) simple loop. That's a borderline gain, though - and other issues in the code could easily mean it's actually slower.

64 位(16 位十进制数字)的效率增益更明显,因为仍然只有 16 个操作(8 个掩码、4 个移位、4 个加法)而不是 31 个,但找到支持 64 位 BCD 操作的处理器的几率看起来很苗条.即使你这样做了,你多久需要一次?似乎不太可能值得付出努力和失去可移植性.

The efficiency gain is clearer at 64 bits (16 decimal digits) as there's still only 16 operations (8 masks, 4 shifts, 4 additions) rather than 31, but the odds of finding a processor that supports 64-bit BCD operations seems slim. And even if you did, how often do you need this anyway? It seems unlikely that it could be worth the effort and loss of portability.

这篇关于将数字的各个数字分量相加/求和的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆