如何计算两个加权样本之间的Kolmogorov-Smirnov统计量 [英] How to calculate the Kolmogorov-Smirnov statistic between two weighted samples

查看:220
本文介绍了如何计算两个加权样本之间的Kolmogorov-Smirnov统计量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有两个样本data1data2,它们的权重分别为weight1weight2,并且我们要计算两个加权样本之间的Kolmogorov-Smirnov统计量.

Let's say that we have two samples data1 and data2 with their respective weights weight1 and weight2 and that we want to calculate the Kolmogorov-Smirnov statistic between the two weighted samples.

我们在python中执行此操作的方式如下:

The way we do that in python follows:

import numpy as np

def ks_w(data1,data2,wei1,wei2):
    ix1=np.argsort(data1)
    ix2=np.argsort(data2)
    wei1=wei1[ix1]
    wei2=wei2[ix2]
    data1=data1[ix1]
    data2=data2[ix2]
    d=0.
    fn1=0.
    fn2=0.
    j1=0
    j2=0
    j1w=0.
    j2w=0.
    while(j1<len(data1))&(j2<len(data2)):
        d1=data1[j1]
        d2=data2[j2]
        w1=wei1[j1]
        w2=wei2[j2]
        if d1<=d2:
            j1+=1
            j1w+=w1
            fn1=(j1w)/sum(wei1)
        if d2<=d1:
            j2+=1
            j2w+=w2
            fn2=(j2w)/sum(wei2)
        if abs(fn2-fn1)>d:
            d=abs(fn2-fn1)
    return d

我们只是根据目的修改了经典的两样本KS统计量,该统计量在 Press,Flannery,Teukolsky,Vetterling-C中的数字食谱-剑桥大学出版社-1992-pag.626 中实现.

where we just modify to our purpose the classical two-sample KS statistic as implemented in Press, Flannery, Teukolsky, Vetterling - Numerical Recipes in C - Cambridge University Press - 1992 - pag.626.

我们的问题是:

  • 有人知道其他方法吗?
  • python/R/*中是否有执行该功能的库?
  • 考试怎么样?它是否存在,或者我们应该使用改组程序来评估统计信息?

推荐答案

此解决方案基于scipy.stats.ks_2samp的代码,并且运行时间约为1/10000(

This solution is based on the code for scipy.stats.ks_2samp and runs in about 1/10000 the time (notebook):

import numpy as np

def ks_w2(data1, data2, wei1, wei2):
    ix1 = np.argsort(data1)
    ix2 = np.argsort(data2)
    data1 = data1[ix1]
    data2 = data2[ix2]
    wei1 = wei1[ix1]
    wei2 = wei2[ix2]
    data = np.concatenate([data1, data2])
    cwei1 = np.hstack([0, np.cumsum(wei1)/sum(wei1)])
    cwei2 = np.hstack([0, np.cumsum(wei2)/sum(wei2)])
    cdf1we = cwei1[[np.searchsorted(data1, data, side='right')]]
    cdf2we = cwei2[[np.searchsorted(data2, data, side='right')]]
    return np.max(np.abs(cdf1we - cdf2we))

这里是对其准确性和性能的测试:

Here's a test of its accuracy and performance:

ds1 = np.random.rand(10000)
ds2 = np.random.randn(40000) + .2
we1 = np.random.rand(10000) + 1.
we2 = np.random.rand(40000) + 1.

ks_w2(ds1, ds2, we1, we2)
# 0.4210415232236593
ks_w(ds1, ds2, we1, we2)
# 0.4210415232236593

%timeit ks_w2(ds1, ds2, we1, we2)
# 100 loops, best of 3: 17.1 ms per loop
%timeit ks_w(ds1, ds2, we1, we2)
# 1 loop, best of 3: 3min 44s per loop

这篇关于如何计算两个加权样本之间的Kolmogorov-Smirnov统计量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆