寻找款项阵的快速算法 [英] fast algorithm of finding sums in array
问题描述
我要寻找一个快速算法:
I am looking for a fast algorithm:
我有大小为n的int数组,目标是找到阵列中的所有模式,
X1,X2,X3是在阵列中的不同元素,使得X1 + X2 = X3的
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
例如我知道有大小3的int数组是 [1,2,3]
那么就只有一个可能性:1 + 2 = 3(考虑1+ 2 = 2 + 1)的
For example I know there's a int array of size 3 is [1, 2, 3]
then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
我想实现对和包含HashMap,使算法快。 (最快的一个我现在得到的是仍为O(n ^ 2))
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
请分享你的想法对于这个问题,谢谢
Please share your idea for this problem, thank you
推荐答案
修改:下面的答案适用于一个版本的问题,你只需要一个三线,增加了这样的。当您希望所有的人,因为有可能至少为O(n ^ 2)可能的输出(如指出ex0du5),甚至为O(n ^ 3)重复元素病理情况下,你不会击败了简单的为O(n ^ 2)基于散列(映射从价值与价值指数的列表)算法。
Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
这是基本的3SUM问题。如果没有潜在的无粘结大元素,最有名的算法是约为O(n ^ 2)
,但我们只证明,它不能快于为O(n LG N)
进行计算的大多数型号。
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2)
, but we've only proved that it can't be faster than O(n lg n)
for most models of computation.
如果整数元素在范围 [U,V]
,你可以做一个稍微不同的版本,这在 O(N + (VU)LG(VU))
有 FFT 。我将描述一个过程来改变这个问题转化为一个,解决它在那里,然后找出在此基础上改造的回答你的问题。
If the integer elements lie in the range [u, v]
, you can do a slightly different version of this in O(n + (v-u) lg (v-u))
with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.
我知道如何解决与FFT的问题是找到一个数组长度为3的等差数列:即一个序列 A
, B
, C
与Ç - B = B - A
,或者等价地, A + C = 2B
。
The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a
, b
, c
with c - b = b - a
, or equivalently, a + c = 2b
.
不幸的是,转型的最后一步是不一样快,我想,但我会讲,当我们到达那里。
Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.
让我们把原来的数组 X
,其中包含整数 X_1,...,x_n
。我们希望找到指数我
,Ĵ
, K
这样的该 x_i + x_j = x_k
。
Let's call your original array X
, which contains integers x_1, ..., x_n
. We want to find indices i
, j
, k
such that x_i + x_j = x_k
.
-
查找最低
U
和最大v
的X
在O(N)
的时间。让U
是分(U,U * 2)
和V'
是MAX(V,V * 2)
。
Find the minimum
u
and maximumv
ofX
inO(n)
time. Letu'
bemin(u, u*2)
andv'
bemax(v, v*2)
.
构造一个二进制数组(比特串)以Z
长度 V' - U'+ 1
; Z [I]
将是真实的,如果任一 X
或它的双 [X_1 * 2, ...,x_n * 2]
包含 U'+ I
。这是 O(N)
来初始化;刚走到 X的每个元素
,并设置以Z
的两个对应的元素。
Construct a binary array (bitstring) Z
of length v' - u' + 1
; Z[i]
will be true if either X
or its double [x_1*2, ..., x_n*2]
contains u' + i
. This is O(n)
to initialize; just walk over each element of X
and set the two corresponding elements of Z
.
由于我们正在构建这个数组,我们可以节省我们发现任何重复的索引到一个辅助列表是
。一旦以Z
完成后,我们只是检查 2 * x_i
每个 x_i
在是
。如果有任何present,我们就大功告成了;否则,重复是不相关的,我们可以忘掉是
。 (稍微复杂一点的唯一情况是,如果 0
重复;然后,我们需要的是三个不同的副本得到解决)
As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y
. Once Z
is complete, we just check for 2 * x_i
for each x_i
in Y
. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y
. (The only situation slightly more complicated is if 0
is repeated; then we need three distinct copies of it to get a solution.)
现在,解决你的问题,即 x_i + x_j = x_k
,将出现在以Z
三均匀间隔的,因为一些简单的代数运算给我们 2 * x_j - x_k = x_k - 2 * x_i
。请注意,在末端的元素是我们特别增加了一倍项(从 2
)和一个在中间是一个普通的项目(从 X
)。
Now, a solution to your problem, i.e. x_i + x_j = x_k
, will appear in Z
as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i
. Note that the elements on the ends are our special doubled entries (from 2X
) and the one in the middle is a regular entry (from X
).
考虑以Z
作为重presentation多项式 P
,其中系数度的学期我
是 Z [I]
。如果 X
是 [1,2,3,5]
,然后以Z
是 1111110001
(因为我们1,2,3,4,5,6,和10); P
是那么的 1 + X + X 2 + X 3 + X 4 + X 5 + X 9 的。
Consider Z
as a representation of a polynomial p
, where the coefficient for the term of degree i
is Z[i]
. If X
is [1, 2, 3, 5]
, then Z
is 1111110001
(because we have 1, 2, 3, 4, 5, 6, and 10); p
is then 1 + x + x2 + x3 + x4 + x5 + x9.
现在,从高中代数记得系数的 X C 的两种多项式的产品的总和在所有的 A,B X A 的时间第二次的系数的 X B <中>与 A + B = C 的/ SUP> 的。所以,如果我们考虑的 Q = P 2 的,的的 X 2J 的系数(为的Ĵ 的与 Z [J] = 1
)将是一笔对所有的我的的Z [I] * Z [2 *的J - I]
。但是,由于以Z
是二进制的,那是三胞胎数完全的我在,J,K
的它们均匀间隔的人ž。注意的(j,J,J)的始终是这样一个三线,所以我们只关心那些与值> 1
Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1
) will be the sum over all i of Z[i] * Z[2*j - i]
. But since Z
is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z
. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.
然后,我们可以使用快速傅立叶变换找到的 P 2 的在 O(| Z |登录| Z |)
时间,其中<code> | Z | 是 V' - U'+ 1
。我们走出系数的另一个数组;称之为是W
。
We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|)
time, where |Z|
is v' - u' + 1
. We get out another array of coefficients; call it W
.
遍历每个 x_k
在 X
。 (回想一下,我们所期望的均匀间隔的人都集中 X
的元素,而不是 2 * X
)。如果相应的是W
两次这个元素,即 W [2 *(x_k - U')
,是1我们知道这不是任何非平凡级数的中心,我们可以跳过它。 (如前指出的,它应该仅是一个正整数。)
Loop over each x_k
in X
. (Recall that our desired evenly-spaced ones are all centered on an element of X
, not 2*X
.) If the corresponding W
for twice this element, i.e. W[2*(x_k - u')]
, is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)
否则,它可能是我们希望有一个发展的中心,(因此我们需要找到我
和Ĵ
)。但是,不幸的是,它也可能是不具有我们所期望的形式的进展的中心。因此,我们需要检查。遍历其它元素 x_i
的 X
,并检查是否有一个三用 2 * x_i
, x_k
, 2 * x_j
一些Ĵ
(通过检查 Z [2 *(x_k - x_j) - U']
)。如果是这样,我们有一个答案;如果我们让它在所有 X
的无一命中,那么FFT只发现了虚假的答案,我们必须检查另一个元素是W
。
Otherwise, it might be the center of a progression that we want (so we need to find i
and j
). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i
of X
, and check if there's a triple with 2*x_i
, x_k
, 2*x_j
for some j
(by checking Z[2*(x_k - x_j) - u']
). If so, we have an answer; if we make it through all of X
without a hit, then the FFT found only spurious answers, and we have to check another element of W
.
因此,此最后的步骤是O(n * 1 +(x_k为W的数量[2 *(x_k - U')]> 1是不实际的解决方案)),其是也许可能为O(n ^ 2)
,这显然是不行的。应该有一种方法来避免输出是W
生成这些虚假的答案;如果我们知道任何适当的是W
系数肯定有一个答案,这最后一步将是 O(N)
和所有都会很好。
This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2)
, which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W
; if we knew that any appropriate W
coefficient definitely had an answer, this last step would be O(n)
and all would be well.
我认为这是可以使用稍微不同的多项式要做到这一点,但我还没有得到它的实际工作。我会考虑的更多一些......
I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....
部分基于这个答案。的
这篇关于寻找款项阵的快速算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!