高效的算法来计算一个排序的数组的pariwise绝对金额的中位数 [英] Efficient algorithm to compute the median of pariwise absolute sums of a sorted array
问题描述
我试图想出一个快速的算法来计算
量 B [I] =配有| y_i + y_j |,1<!= J =< = N
时,
在 Y_1,...,y_n
已经排序(因此 B []
是一个向量
相同长度的Y []
)。
我假设 Y []
中的所有元素都是唯一的
并且n为偶数。
所以,code以下计算 B [I]
的天真的(为O(n ** 2)
)的方式:
(我在研发写了这个为了方便,但我的语言无关)
N'LT; -30
a_fast< -b_slow< -rep(NA,N)
Y'LT; -sort(RNORM(N,100,1))
z,其中,-y
为(i的1:N){
b_slow [1] - ; -median(ABS(Y [-i] +值Y [i]))
}
我有一个初步建议--below--做它 O(N)
。
但它只能如果 Y []
包含正数。
我的问题是:我应该如何改变快速算法
工作也当 Y []
包含正面和
负数?这甚至可能?
修改
和下面的(暂定) O(N)
办法code
(我在研发写了这个为了方便,但我的语言无关)
tryA< - 地板(1+(N-1)/ 2 + 1)
tryB&其中; - 地板(1+(N-1)/ 2)
梅达< -y [tryA]
梅德布< -y [tryB]
为(i的1:(tryA-1)){
a_fast [1] - ; -medA + Y [i]于
}
对(我在tryA:N){
a_fast [1] - ; -medB + Y [i]于
}
简单的例子:
简单的,说明性的例子。如果我们有长度为4的矢量
-3,-1,2,4
然后,例如对于i = 1时,3绝对成对和数是
4 1 1
和他们的中位数为1。
然后,例如对于i = 2,3绝对成对和数是
4 1 3
和他们的中位数为3。
下面是一个较长的例子有正反两方面的 Y []
:
-1.27 -0.69 -0.56 -0.45 -0.23 0.07 0.13 0.46 1.56 1.72
和这里是我的新 b_slow []
(这是地面thruth,计算用简单的方式):
1.20 0.92 1.00 1.01 0.79 0.53 0.56 0.53 1.33 1.49
但现在,我的新 a_fast []
不匹配,没有更多的:
-1.20 -0.62 -0.49 -0.38 -0.16 -0.16 -0.10 0.23 1.33 1.49
修改
下面是我实现弗朗西斯的解决方案(最多的地步,我们有两个有序数组,中位数,其中容易计算)。我这样做是在研发停留在问题的精神。
不过,我似乎缺少一个修正系数为索引(在code以下的WW),所以低于code是有时关闭的一点点。这是因为,在上述定义,我们通过n-1个观测计算中位数(ⅰ!= j)条
N'LT; -100
Y'LT; -rnorm(N)
Y'LT; -sort(Y)
b将-rep(NA,N)
#Naive --O(N ** 2) - approch:
为(i的1:N){
B〔1] - ; -median(ABS(Y [-i] + Y [I]))
}
K< -rep(NA,N)
I< -1
K表[1] - ; -min(na.omit(C(其中(Y + Y [1] - 0)[1],N)))#binary搜索:O(日志(N)) -
为(在我2:N){#O(N)
k_prov&其中; -k [I-1]
而(Y [k_prov] + Y [i]于大于0&安培;&安培; k_prov大于0)k_prov&所述; -k_prov -1-
K表[1] - ; -max(k_prov + 1,1)
#for(ⅰ在1:n)的{应当给出相同的结果。
#k中[1] - ;华征信(Y +值Y [i]&0)[1]
#}
}
I< -sample(1:N,1)
X1< - Y [1:(K [I] -1)] - Y [I]
X2< -y [I] + Y [N:克[我]
×3其中-C(X1,X2)
图(X3)
WW&其中; -ifelse(ⅰ&所述; k [1] - 安培;我将N / 2,N / 2 + 1,N / 2)
排序(X3)[WW]#这个可以有效地计算:O(日志(N))
B〔I]#这个是为O(n ** 2)的结果。
下面是一个O(Nxln(N)XLN(N))解决方案:
对于所有i:
1)找到项目k,使得如 J< K< => Y [J] + Y [1] - 0
(二分法,O(LN(N)))
K分隔两个的有序列表:1以上-y [I],其他下面-y [I]的量,符号应改变以获得绝对(值Y [i] + Y [j]的)。 现在,我们正在寻找这些列表的中位数。
从这里,它是<公正问题href="http://stackoverflow.com/questions/4607945/how-to-find-the-kth-smallest-element-in-the-union-of-two-sorted-arrays/11698659#11698659">finding两个有序列表中值的,重复n次。
2)让我们挑选最大(M = ABS(Y [1] -y [I])或M = ABS(Y [大小] -y [I]))和最小(约ķ这些列表的米)并重新启动二分法(O(LN(N))。让我们开始挑选中(M + M)/ 2 ......在任何阶段,让我们挑中间...
3)阶段这个大二分法:有多少项Y [J] + Y [I]高于(M + M)/ 2在第一个列表?再次二分法... O(LN(N))。有多少项目-y [J] -y [I]高于(M + M)/ 2在第二个列表?你猜怎么了 ?二分法......心这两个数字。如果它是上述(尺寸-1)/ 2,M =(M +米)/ 2。否则,M =(M +米)/ 2
4)在m = M停止! B [I] =米;
我想有人会配有一个更好的解决办法...
编辑:我要感谢@ user189035为他联系到一个O(LN(N + M))算法来计算两个有序列表中值。 <一href="http://stackoverflow.com/questions/4607945/how-to-find-the-kth-smallest-element-in-the-union-of-two-sorted-arrays/8935157#8935157">How找到两排序数组的工会第k个最小元素?
再见,
I'm trying to come up with a fast algorithm to compute
the quantity b[i]= med |y_i+y_j|, 1<=j!=i<=n
when
the y_1,...,y_n
are sorted already (so b[]
is a vector
of same length as y[]
).
I will assume that all elements of y[]
are unique
and that n is even.
So, the code below computes the b[i]
's the naive (O(n**2)
) way:
(I wrote this in R for convenience, but I'm language agnostic)
n<-30
a_fast<-b_slow<-rep(NA,n)
y<-sort(rnorm(n,100,1))
z<-y
for(i in 1:n){
b_slow[i]<-median(abs(y[-i]+y[i]))
}
I have a tentative proposal --below-- for doing it in O(n)
.
But it only works if y[]
contains positive numbers.
My question is: how should I change the fast algorithm
to also work when y[]
contains both positive and
negative numbers? Is this even possible?
EDIT:
And the code below the (tentative) O(n)
way
(I wrote this in R for convenience, but I'm language agnostic)
tryA<-floor(1+(n-1)/2+1)
tryB<-floor(1+(n-1)/2)
medA<-y[tryA]
medB<-y[tryB]
for(i in 1:(tryA-1)){
a_fast[i]<-medA+y[i]
}
for(i in tryA:n){
a_fast[i]<-medB+y[i]
}
Simple example:
Simple, illustrative example. If we have a vector of length 4
-3, -1, 2, 4
Then, for example for i=1, the 3 absolute pairwise sums are
4 1 1
and their median is 1.
Then, for example for i=2, the 3 absolute pairwise sums are
4 1 3
and their median is 3.
Here is a longer example with both positive and negative y[]
:
-1.27 -0.69 -0.56 -0.45 -0.23 0.07 0.13 0.46 1.56 1.72
and here are my new b_slow[]
(this is the ground thruth, computed the naive way):
1.20 0.92 1.00 1.01 0.79 0.53 0.56 0.53 1.33 1.49
but now, my new a_fast[]
don't match no more:
-1.20 -0.62 -0.49 -0.38 -0.16 -0.16 -0.10 0.23 1.33 1.49
EDIT:
Here is my implementation of Francis's solution (up to the point where we have two sorted array, the median of which is easy to compute). I did it in R to stay in the spirit of the question.
Nonetheless, I seem to be missing a correction factor for the index (the ww in the code below) so the code below is sometimes off by a little bit. This is because in the definition above we compute the medians over n-1 observations (i!=j).
n<-100
y<-rnorm(n)
y<-sort(y)
b<-rep(NA,n)
#Naive --O(n**2)-- approch:
for(i in 1:n){
b[i]<-median(abs(y[-i]+y[i]))
}
k<-rep(NA,n)
i<-1
k[i]<-min(na.omit(c(which(y+y[i]>0)[1],n))) #binary search: O(log(n)) --
for(i in 2:n){ #O(n)
k_prov<-k[i-1]
while(y[k_prov]+y[i]>0 && k_prov>0) k_prov<-k_prov-1
k[i]<-max(k_prov+1,1)
#for(i in 1:n){ should give the same result.
# k[i]<-which(y+y[i]>0)[1]
#}
}
i<-sample(1:n,1)
x1<--y[1:(k[i]-1)]-y[i]
x2<-y[i]+y[n:k[i]]
x3<-c(x1,x2)
plot(x3)
ww<-ifelse(i<k[i] & i>n/2,n/2+1,n/2)
sort(x3)[ww] #this can be computed efficiently: O(log(n))
b[i] #this is the O(n**2) result.
Here is a O(Nxln(N)xln(N)) solution :
for all i :
1) find item k such as j<k <=> y[j]+y[i]<0
(dichotomy, O(ln(N)))
k separates two sorted lists : one above -y[i], the other below -y[i], for which the sign should be changed to get abs(y[i]+y[j]). Now, we are looking for the median of these lists.
From here, it is just the problem of finding the median of two sorted lists, repeated n times.
2)Let's pick the maximum (M=abs(y[1]-y[i]) or M=abs(y[size]-y[i])) and minimum (m around k) of these lists and restart a dichotomy (O(ln(N)). Let's start by picking the middle (M+m)/2...at any stage, let pick the middle...
3)Stage of this big dichotomy : How many items y[j]+y[i] are above (M+m)/2 in the first list ? Once again a dichotomy... O(ln(N)). How many items -y[j]-y[i] are above (M+m)/2 in the second list ? Guess what ? Dichotomy... Sum these two numbers. If it is above (size-1)/2, m=(M+m)/2. Otherwise M=(M+m)/2.
4)If m=M stop ! b[i]=m;
I guess somebody will come with a better solution...
Edit : I should thank @user189035 for his link to an O(ln(n+m)) algorithm to compute the median of two sorted lists. How to find the kth smallest element in the union of two sorted arrays?
Bye,
这篇关于高效的算法来计算一个排序的数组的pariwise绝对金额的中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!