有效地确定用户单击超链接的可能性 [英] Efficiently determining the probability of a user clicking a hyperlink
问题描述
因此,我在网页上有一堆超链接.从过去的观察中,我知道用户单击这些超链接中的每一个的可能性.因此,我可以计算出这些概率的均值和标准差.
So I have a bunch of hyperlinks on a web page. From past observation I know the probabilities that a user will click on each of these hyperlinks. I can therefore calculate the mean and standard deviation of these probabilities.
我现在向该页面添加一个新的超链接.经过短暂的测试,我发现有20个用户看到此超链接,其中5个单击它.
I now add a new hyperlink to this page. After a short amount of testing I find that of the 20 users that see this hyperlink, 5 click on it.
考虑到其他超链接上的点击概率的已知均值和标准偏差(这形成事先期望"),我如何有效地估计用户单击新超链接的概率?
Taking into account the known mean and standard deviation of the click-through probabilities on other hyperlinks (this forms a "prior expectation"), how can I efficiently estimate the probability of a user clicking on the new hyperlink?
一个幼稚的解决方案是忽略其他概率,在这种情况下,我的估计仅为5/20或0.25-但这意味着我们正在丢弃相关信息,即我们对点击率的先前期望.
A naive solution would be to ignore the other probabilities, in which case my estimate is just 5/20 or 0.25 - however this means we are throwing away relevant information, namely our prior expectation of what the click-through probability is.
所以我正在寻找一个看起来像这样的函数:
So I'm looking for a function that looks something like this:
double estimate(double priorMean,
double priorStandardDeviation,
int clicks, int views);
我想问的是,由于我对代码比对数学符号更熟悉,所以所有答案都优先于数学而使用代码或伪代码.
I'd ask that, since I'm more familiar with code than mathematical notation, that any answers use code or pseudocode in preference to math.
推荐答案
我提出了一个新的答案,因为它根本不同.
I made this a new answer since it's fundamentally different.
这是基于Chris Bishop,机器学习和模式识别,第2章概率分布" p71 ++和 http://en.wikipedia.org/wiki/Beta_distribution .
This is based on Chris Bishop, Machine Learning and Pattern Recognition, Chapter 2 "Probability Distributions" p71++ and http://en.wikipedia.org/wiki/Beta_distribution.
首先,我们将Beta分布拟合到给定的均值和方差,以便在参数上建立分布.然后,我们返回分布的模式,这是bernoulli变量的预期参数.
First we fit a beta distribution to the given mean and variance in order to build a distribution over the parametes. Then we return the mode of the distribution which is the expected parameter for a bernoulli variable.
def estimate(prior_mean, prior_variance, clicks, views):
c = ((prior_mean * (1 - prior_mean)) / prior_variance - 1)
a = prior_mean * c
b = (1 - prior_mean) * c
return ((a + clicks) - 1) / (a + b + views - 2)
但是,我非常肯定先验均值/方差对您不起作用,因为您会丢弃有关您拥有多少样本以及先验状况如何的信息.
However, I am quite positive that the prior mean/variance will not work for you since you throw away information about how many samples you have and how good your prior thus is.
相反:给定一组(网页,link_clicked)对,您可以计算特定链接被单击的页面数.让它成为m.让该链接未被点击的次数为l.
Instead: Given a set of (webpage, link_clicked) pairs, you can calculate the number of pages a specific link was clicked on. Let that be m. Let the amount of times that link was not clicked be l.
现在,将对新链接的点击次数设为a,将对网站的访问次数设为b.那么您建立新链接的概率为
Now let a be the number of clicks to your new link be a and the number of visits to the site be b. Then your probability of your new link is
def estimate(m, l, a, b):
(m + a) / (m + l + a + b)
这看起来很琐碎,但实际上具有有效的概率基础.从实现的角度来看,您可以全局保留m和l.
Which looks pretty trivial but actually has a valid probabilistic foundation. From the implementation perspective, you can keep m and l globally.
这篇关于有效地确定用户单击超链接的可能性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!