如何计算两个加权样本之间的Kolmogorov-Smirnov统计量 [英] How to calculate the Kolmogorov-Smirnov statistic between two weighted samples
问题描述
假设我们有两个样本data1
和data2
,它们的权重分别为weight1
和weight2
,并且我们要计算两个加权样本之间的Kolmogorov-Smirnov统计量.
Let's say that we have two samples data1
and data2
with their respective weights weight1
and weight2
and that we want to calculate the Kolmogorov-Smirnov statistic between the two weighted samples.
我们在python中执行此操作的方式如下:
The way we do that in python follows:
import numpy as np
def ks_w(data1,data2,wei1,wei2):
ix1=np.argsort(data1)
ix2=np.argsort(data2)
wei1=wei1[ix1]
wei2=wei2[ix2]
data1=data1[ix1]
data2=data2[ix2]
d=0.
fn1=0.
fn2=0.
j1=0
j2=0
j1w=0.
j2w=0.
while(j1<len(data1))&(j2<len(data2)):
d1=data1[j1]
d2=data2[j2]
w1=wei1[j1]
w2=wei2[j2]
if d1<=d2:
j1+=1
j1w+=w1
fn1=(j1w)/sum(wei1)
if d2<=d1:
j2+=1
j2w+=w2
fn2=(j2w)/sum(wei2)
if abs(fn2-fn1)>d:
d=abs(fn2-fn1)
return d
我们只是根据目的修改了经典的两样本KS统计量,该统计量在 Press,Flannery,Teukolsky,Vetterling-C中的数字食谱-剑桥大学出版社-1992-pag.626 中实现.
where we just modify to our purpose the classical two-sample KS statistic as implemented in Press, Flannery, Teukolsky, Vetterling - Numerical Recipes in C - Cambridge University Press - 1992 - pag.626.
我们的问题是:
- 有人知道其他方法吗?
- python/R/*中是否有执行该功能的库?
- 考试怎么样?它是否存在,或者我们应该使用改组程序来评估统计信息?
推荐答案
此解决方案基于scipy.stats.ks_2samp
的代码,并且运行时间约为1/10000(
This solution is based on the code for scipy.stats.ks_2samp
and runs in about 1/10000 the time (notebook):
import numpy as np
def ks_w2(data1, data2, wei1, wei2):
ix1 = np.argsort(data1)
ix2 = np.argsort(data2)
data1 = data1[ix1]
data2 = data2[ix2]
wei1 = wei1[ix1]
wei2 = wei2[ix2]
data = np.concatenate([data1, data2])
cwei1 = np.hstack([0, np.cumsum(wei1)/sum(wei1)])
cwei2 = np.hstack([0, np.cumsum(wei2)/sum(wei2)])
cdf1we = cwei1[[np.searchsorted(data1, data, side='right')]]
cdf2we = cwei2[[np.searchsorted(data2, data, side='right')]]
return np.max(np.abs(cdf1we - cdf2we))
这里是对其准确性和性能的测试:
Here's a test of its accuracy and performance:
ds1 = np.random.rand(10000)
ds2 = np.random.randn(40000) + .2
we1 = np.random.rand(10000) + 1.
we2 = np.random.rand(40000) + 1.
ks_w2(ds1, ds2, we1, we2)
# 0.4210415232236593
ks_w(ds1, ds2, we1, we2)
# 0.4210415232236593
%timeit ks_w2(ds1, ds2, we1, we2)
# 100 loops, best of 3: 17.1 ms per loop
%timeit ks_w(ds1, ds2, we1, we2)
# 1 loop, best of 3: 3min 44s per loop
这篇关于如何计算两个加权样本之间的Kolmogorov-Smirnov统计量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!