使用Monte Carlo与scipy.integrate.nquad的不同积分结果 [英] Different integration results using Monte Carlo vs scipy.integrate.nquad

查看:105
本文介绍了使用Monte Carlo与scipy.integrate.nquad的不同积分结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的MWE显示了两种相同的2D内核密度估计值的集成方式,这些方法是使用此数据获得的. stats.gaussian_kde()功能.

The MWE below shows two ways of integrating the same 2D kernel density estimate, obtained for this data using the stats.gaussian_kde() function.

对阈值点(x1, y1)以下的所有(x, y)进行积分,该点定义了积分上限(积分下限为-infinity;请参阅MWE).

The integration is performed for all (x, y) below the threshold point (x1, y1), which defines the upper integration limits (lower integration limits are -infinity; see MWE).

  • int1函数使用简单的蒙特卡洛方法.
  • int2函数使用
  • The int1 function uses simple a Monte Carlo approach.
  • The int2 function uses the scipy.integrate.nquad function.

问题在于,int1(即,蒙特卡洛方法)相对于int2系统地给出了较大的积分值.我不知道为什么会这样.

The issue is that int1 (ie: the Monte Carlo method) gives systematically larger values for the integral than int2. I don't know why this happens.

下面是200次int1(蓝色直方图)运行得到的积分值与int2(红色垂直线)给出的积分结果相对的示例:

Here's an example of the integral values obtained after 200 runs of int1 (blue histogram) versus the integral result given by int2 (red vertical line):

所得积分值中这种差异的根源是什么?

What is the origin of this difference in the resulting integral value?

MWE

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy import integrate


def int1(kernel, x1, y1):
    # Compute the point below which to integrate
    iso = kernel((x1, y1))

    # Sample KDE distribution
    sample = kernel.resample(size=50000)

    # Filter the sample
    insample = kernel(sample) < iso

    # The integral is equivalent to the probability of drawing a
    # point that gets through the filter
    integral = insample.sum() / float(insample.shape[0])

    return integral


def int2(kernel, x1, y1):

    def f_kde(x, y):
        return kernel((x, y))

    # 2D integration in: (-inf, x1), (-inf, y1).
    integral = integrate.nquad(f_kde, [[-np.inf, x1], [-np.inf, y1]])

    return integral


# Obtain data from file.
data = np.loadtxt('data.dat', unpack=True)
# Perform a kernel density estimate (KDE) on the data
kernel = stats.gaussian_kde(data)

# Define the threshold point that determines the integration limits.
x1, y1 = 2.5, 1.5

i2 = int2(kernel, x1, y1)
print i2

int1_vals = []
for _ in range(200):
    i = int1(kernel, x1, y1)
    int1_vals.append(i)
    print i


添加

请注意,此问题源自此答案.起初,我没有注意到答案是在使用的积分限制中弄错了,这解释了为什么int1int2之间的结果不同.

Notice that this question originated from this answer. At first I didn't notice that the answer was mistaken in the integration limits used, which explains why the results between int1 and int2 are different.

int1集成在域f(x,y)<f(x1,y1)中(其中f是内核密度估计值),而int2集成在域(x,y)<(x1,y1)中.

int1 is integrating in the domain f(x,y)<f(x1,y1) (where f is the kernel density estimate), while int2 integrates in the domain (x,y)<(x1,y1).

推荐答案

您需要重新分配分布

sample = kernel.resample(size=50000)

然后计算每个采样点的概率小于边界处的概率

and then compute the probability for each sampled point is less than the probability at the bound

insample = kernel(sample) < iso

这是不正确的.考虑边界(0,100),并假设您的数据具有u =(0,0)和cov = [[100,0],[0,100]].点(0,50)和(50,0)在此内核中具有相同的概率,但其中只有一个在边界内.由于两者都通过了测试,因此您采样过度.

This is incorrect. Consider the bounds (0,100) and assume your data has u=(0,0) and cov=[[100,0],[0,100]]. Points (0,50) and (50,0) have the same probability in this kernel, but only one of them is in the bounds. Since both pass the test, you are over sampling.

您应该测试sample中的每个点是否在边界内,然后计算概率.像

You should be testing whether each point in sample is inside the bounds, then compute the probability. Something like

def int1(kernel, x1, y1):
    # Sample KDE distribution                                                                                                              
    sample = kernel.resample(size=100)

    include = (sample < np.repeat([[x1],[y1]],sample.shape[1],axis=1)).all(axis=0)
    integral = include.sum() / float(sample.shape[1])
    return integral

我使用以下代码对此进行了测试

I tested this using the following code

def measure(n):

    m1 = np.random.normal(size=n)
    m2 = np.random.normal(size=n)
    return m1,m2

a = scipy.stats.gaussian_kde( np.vstack(measure(1000)) )
print(int1(a,-10,-10))
print(int2(a,-10,-10))
print(int1(a,0,0))
print(int2(a,-0,-0))

收益

0.0
(4.304674927251112e-232, 4.6980863813551415e-230)
0.26
(0.25897626178338407, 1.4536217446381293e-08)

Monte Carlo集成应该像这样

Monte Carlo integration should work like this

  • 在x/y可能值的某些子集上抽样N个随机值(一致地,不是从您的分布中抽取)(在下面,我用均值的10个SD对其进行限制).
  • 对于每个随机值计算内核(rand_x,rand_y)
  • 计算总和并乘以(体积)/N_samples个

在代码中:

def mc_wo_sample(kernel,x1,y1,lboundx,lboundy):
    nsamples = 50000
    volume = (x1-lboundx)*(y1-lboundy)
    # generate uniform points in range                                                                                                     
    xrand = np.random.rand(nsamples,1)*(x1-lboundx) + lboundx
    yrand = np.random.rand(nsamples,1)*(y1-lboundy) + lboundy
    randvals = np.hstack((xrand,yrand)).transpose()
    print randvals.shape
    return (volume*kernel(randvals).sum())/nsamples

运行以下内容

   print(int1(a,-9,-9))
   print(int2(a,-9,-9))
   print(mc_wo_sample(a,-9,-9,-10,-10))
   print(int1(a,0,0))
   print(int2(a,-0,-0))
   print(mc_wo_sample(a,0,0,-10,-10))

收益

0.0
(4.012958496109042e-70, 6.7211236076277e-71)
4.08538890986e-70
0.36
(0.37101621760650216, 1.4670898180664756e-08)
0.361614657674

这篇关于使用Monte Carlo与scipy.integrate.nquad的不同积分结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆