scipy.stats.rv_continuous 的子类化 [英] subclassing of scipy.stats.rv_continuous

查看:43
本文介绍了scipy.stats.rv_continuous 的子类化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于 scipy.stats.rv_continuous 的子类化,我有 3 个问题.我的目标是编写一个包含截断正态分布、截断指数分布和 2 个均匀分布的统计混合模型.

I have 3 questions regarding subclassing of scipy.stats.rv_continuous. My goal is to code a statistical mixture model of a truncated normal distribution, a truncated exponential distribution and 2 uniform distributions.

1) 为什么通过 mm_model.rvs(size = 1000) 绘制随机变量如此之慢?我在纪录片中读到了一些关于性能问题的内容,但我真的很惊讶.

1) Why is drawing random variates via mm_model.rvs(size = 1000) so extremely slow? I read something about performance issues in the documentary, but I was really surprised.

2) 通过 mm_model.rvs(size = 1000) 绘制随机变量后,我得到了这个 IntegrationWarning?

2) After drawing random variates via mm_model.rvs(size = 1000) I get this IntegrationWarning?

IntegrationWarning:已达到最大细分数 (50).如果增加限制没有改善,建议分析被积函数以确定困难.如果一个位置可以确定局部难度(奇异性、不连续性)可能会从分割间隔和调用积分器中获益在子范围上.也许应该使用专用积分器.warnings.warn(msg, IntegrationWarning)

IntegrationWarning: The maximum number of subdivisions (50) has been achieved. If increasing the limit yields no improvement it is advised to analyze the integrand in order to determine the difficulties. If the position of a local difficulty can be determined (singularity, discontinuity) one will probably gain from splitting up the interval and calling the integrator on the subranges. Perhaps a special-purpose integrator should be used. warnings.warn(msg, IntegrationWarning)

3)我在纪录片中读到我可以通过形状"参数将参数传输到pdf.我试图调整我的 pdf 并设置形状参数,但它没有用.有人能解释一下吗?

3) I read in the documentary that I can transmit parameters to the pdf via the "shape" parameter. I tried to adjust my pdf and set the shape parameter but it did not work. Could someone explain it?

感谢您的帮助.

def trunc_norm(z,low_bound,up_bound,mu,sigma):
    a = (low_bound - mu) / sigma
    b = (up_bound - mu) / sigma
    return stats.truncnorm.pdf(z,a,b,loc=mu,scale=sigma)



def trunc_exp(z,up_bound,lam):
    return stats.truncexpon.pdf(z,b=up_bound*lam,scale=1/lam)



def uniform(z,a,b):
    return stats.uniform.pdf(z,loc=a,scale=(b-a))


class Measure_mixture_model(stats.rv_continuous):

    def _pdf(self,z):

        z_true = 8
        z_min = 0
        z_max = 10
        p_hit = 0.7
        p_short = 0.1   
        p_max = 0.1
        p_rand = 0.1
        sig_hit = 1
        lambda_short = 0.5

        sum_p = p_hit + p_short + p_max + p_rand

        if sum_p < 0.99 or 1.01 < sum_p:
            misc.eprint("apriori probabilities p_hit, p_short, p_max, p_rand have to be 1!")
            return None

        pdf_hit = trunc_norm(z,z_min,z_max,z_true,sig_hit)
        pdf_short = trunc_exp(z,z_true,lambda_short)
        pdf_max = uniform(z,0.99*z_max,z_max)
        pdf_rand = uniform(z,z_min,z_max)

        pdf = p_hit * pdf_hit + p_short * pdf_short + p_max * pdf_max + p_rand * pdf_rand

        return pdf




#mm_model = Measure_mixture_model(shapes='z_true,z_min,z_max,p_hit,p_short,p_max,p_rand,sig_hit,lambda_short')
mm_model = Measure_mixture_model()


z = np.linspace(-1,11,1000)

samples = mm_model.pdf(z)
plt.plot(z,samples)
plt.show()


rand_samples = mm_model.rvs(size = 1000)


bins = np.linspace(-1, 11, 100)
plt.hist(rand_samples,bins)
plt.title("measure mixture model")
plt.xlabel("z: measurement")
plt.ylabel("p: relative frequency")
plt.show()

推荐答案

(1) 和 (2) 可能是相关的.您要求 scipy 仅根据您提供的密度生成随机样本.

(1) and (2) are probably related. You are asking scipy to generate random samples based on only the density you provide.

我真的不知道 scipy 是做什么的,但我怀疑它集成了密度(pdf")以获得概率函数(cdf"),然后将其反转以将均匀样本映射到您的分布.这在数值上很昂贵,而且您很容易出错.

I don't really know what scipy does, but I suspect it integrates the density ("pdf") to get the probability function ("cdf") and then inverts it to map uniform samples to your distribution. This is numerically expensive and as you experienced error prone.

为了加快速度,您可以通过直接实现 _rvs 来帮助 scipy.只需绘制一个制服来决定选择你的混合物的哪个子模型,然后调用所选子模型的 rvs.与您可能需要的其他功能类似.

To speed things up you can help scipy by implementing _rvs directly. Just draw a uniform to decide which sub-model of your mixture to select and then invoke the rvs of the selected sub-model. And similar for other functions you may require.

这里有一些关于如何实现矢量化rvs的提示:

Here are some tips on how to implement a vectorised rvs:

批量选择子模型.由于您的子模型数量很少,np.digitize 应该足够了.如果可能,为子模型使用 rv_frozen 实例;它们很方便,但我似乎记得你不能把所有的可选参数都传递给它们,所以你可能需要单独处理它们.

To batch-select sub-models. Since your number of sub-models is small np.digitize should be good enough for this. If possible use rv_frozen instances for sub-models; they are very convenient, but I seem to remember that you can't pass all the optional parameters to them, so you may have to handle those separately.

self._bins = np.cumsum([p_hit, p_short, p_max])
self._bins /= self._bins[-1] + p_rand

submodel = np.digitize(uniform.rvs(size=size), self._bins)

result = np.empty(size)
for j, frozen in enumerate((frz_trunc_norm, frz_trunc_exp, frz_unif_1, frz_unif_2)):
    inds = np.where(submodel == j)
    result[inds] = frozen.rvs(size=inds.shape)
return result

Re (3) 这是 scipy 文档必须说的内容.

Re (3) Here is what the scipy docs have to say.

关于形状的说明:子类不需要明确指定它们.在这种情况下,形状将自动从被覆盖的方法(pdf、cdf 等)的签名中推导出来.如果出于某种原因,您希望避免依赖内省,则可以将形状明确指定为实例构造函数的参数.

A note on shapes: subclasses need not specify them explicitly. In this case, shapes will be automatically deduced from the signatures of the overridden methods (pdf, cdf etc). If, for some reason, you prefer to avoid relying on introspection, you can specify shapes explicitly as an argument to the instance constructor.

所以通常的方法是在你的方法中加入一些参数.

So the normal way would be to just put some arguments in your methods.

这篇关于scipy.stats.rv_continuous 的子类化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆