估计给出来自先验的其他概率的概率 [英] Estimating a probability given other probabilities from a prior

查看:163
本文介绍了估计给出来自先验的其他概率的概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大堆数据(调用自动呼叫中心),关于一个人是否购买特定产品,1个购买,0个购买。

I have a bunch of data coming in (calls to an automated callcenter) about whether or not a person buys a particular product, 1 for buy, 0 for not buy.

我想用这些数据来估算一个人购买特定产品的估计概率,但问题是我可能需要用相对较少的历史数据来做这件事,这些数据是关于有多少人购买/未购买的产品。

I want to use this data to create an estimated probability that a person will buy a particular product, but the problem is that I may need to do it with relatively little historical data about how many people bought/didn't buy that product.

一位朋友建议您使用贝叶斯概率通过提出先验概率分布来帮助您的概率估计,实际上这是关于您的信息在考虑实际数据之前,我们期待看到。

A friend recommended that with Bayesian probability you can "help" your probability estimate by coming up with a "prior probability distribution", essentially this is information about what you expect to see, prior to taking into account the actual data.

所以我想做的是创建一个类似这个签名(Java)的方法:

So what I'd like to do is create a method that has something like this signature (Java):

double estimateProbability(double[] priorProbabilities, int buyCount, int noBuyCount);

priorProbabilities是我在之前的产品中看到的一系列概率,此方法将用于创建此概率的先验分布。 buyCount和noBuyCount是特定于该产品的实际数据,我想根据数据和先前的数据估算用户购买的概率。这是从方法返回的双倍。

priorProbabilities is an array of probabilities I've seen for previous products, which this method would use to create a prior distribution for this probability. buyCount and noBuyCount are the actual data specific to this product, from which I want to estimate the probability of the user buying, given the data and the prior. This is returned from the method as a double.

我不需要数学上完美的解决方案,只需要比统一或平坦的先验更好的东西(即。 probability = buyCount /(buyCount + noBuyCount))。由于我对源代码比数学符号更熟悉,如果人们可以在他们的解释中使用代码,我会很感激。

I don't need a mathematically perfect solution, just something that will do better than a uniform or flat prior (ie. probability = buyCount / (buyCount+noBuyCount)). Since I'm far more familiar with source code than mathematical notation, I'd appreciate it if people could use code in their explanation.

推荐答案

这是贝叶斯计算和一个示例/测试:

Here's the Bayesian computation and one example/test:

def estimateProbability(priorProbs, buyCount, noBuyCount):
  # first, estimate the prob that the actual buy/nobuy counts would be observed
  # given each of the priors (times a constant that's the same in each case and
  # not worth the effort of computing;-)`
  condProbs = [p**buyCount * (1.0-p)**noBuyCount for p in priorProbs]
  # the normalization factor for the above-mentioned neglected constant
  # can most easily be computed just once
  normalize = 1.0 / sum(condProbs)
  # so here's the probability for each of the prior (starting from a uniform
  # metaprior)
  priorMeta = [normalize * cp for cp in condProbs]
  # so the result is the sum of prior probs weighed by prior metaprobs
  return sum(pm * pp for pm, pp in zip(priorMeta, priorProbs))

def example(numProspects=4):
  # the a priori prob of buying was either 0.3 or 0.7, how does it change
  # depending on how 4 prospects bought or didn't?
  for bought in range(0, numProspects+1):
    result = estimateProbability([0.3, 0.7], bought, numProspects-bought)
    print 'b=%d, p=%.2f' % (bought, result)

example()

输出是:

b=0, p=0.31
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.69

与我对这个简单案例的副手计算一致。注意,根据定义,购买概率将始终在先验概率集中的最低和最高之间;如果那不是你想要的,你可能想通过引入两个伪产品引入一点点软糖,一个没有人会买(p = 0.0),任何人都会买(p = 1.0) - 这给出了实际观察的重要性更大,可能更少,而且对过去产品的统计数据更少。如果我们这样做,我们得到:

which agrees with my by-hand computation for this simple case. Note that the probability of buying, by definition, will always be between the lowest and the highest among the set of priori probabilities; if that's not what you want you might want to introduce a little fudge by introducing two "pseudo-products", one that nobody will ever buy (p=0.0), one that anybody will always buy (p=1.0) -- this gives more weight to actual observations, scarce as they may be, and less to statistics about past products. If we do that here, we get:

b=0, p=0.06
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.94

中间水平的捏造(考虑到这种新产品可能比之前销售的任何产品更糟糕但不是不可能的机会,或者比任何一种产品都更好)可以很容易地设想(给出通过向 estimateProbability 的参数添加向量previousWeights,将权重降低到人工0.0和1.0概率。

Intermediate levels of fudging (to account for the unlikely but not impossible chance that this new product may be worse than any one ever previously sold, or better than any of them) can easily be envisioned (give lower weight to the artificial 0.0 and 1.0 probabilities, by adding a vector priorWeights to estimateProbability's arguments).

这种事情是我整日工作的重要组成部分,现在我正在开发商业智能中的应用程序,但我只是无法得到它......! - )

This kind of thing is a substantial part of what I do all day, now that I work developing applications in Business Intelligence, but I just can't get enough of it...!-)

这篇关于估计给出来自先验的其他概率的概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆