有效地确定用户单击超链接的可能性 [英] Efficiently determining the probability of a user clicking a hyperlink

查看:80
本文介绍了有效地确定用户单击超链接的可能性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我在网页上有一堆超链接.从过去的观察中,我知道用户单击这些超链接中的每一个的可能性.因此,我可以计算出这些概率的均值和标准差.

So I have a bunch of hyperlinks on a web page. From past observation I know the probabilities that a user will click on each of these hyperlinks. I can therefore calculate the mean and standard deviation of these probabilities.

我现在向该页面添加一个新的超链接.经过短暂的测试,我发现有20个用户看到此超链接,其中5个单击它.

I now add a new hyperlink to this page. After a short amount of testing I find that of the 20 users that see this hyperlink, 5 click on it.

考虑到其他超链接上的点击概率的已知均值和标准偏差(这形成事先期望"),我如何有效地估计用户单击新超链接的概率?

Taking into account the known mean and standard deviation of the click-through probabilities on other hyperlinks (this forms a "prior expectation"), how can I efficiently estimate the probability of a user clicking on the new hyperlink?

一个幼稚的解决方案是忽略其他概率,在这种情况下,我的估计仅为5/20或0.25-但这意味着我们正在丢弃相关信息,即我们对点击率的先前期望.

A naive solution would be to ignore the other probabilities, in which case my estimate is just 5/20 or 0.25 - however this means we are throwing away relevant information, namely our prior expectation of what the click-through probability is.

所以我正在寻找一个看起来像这样的函数:

So I'm looking for a function that looks something like this:

double estimate(double priorMean, 
                double priorStandardDeviation, 
                int clicks, int views);

我想问的是,由于我对代码比对数学符号更熟悉,所以所有答案都优先于数学而使用代码或伪代码.

I'd ask that, since I'm more familiar with code than mathematical notation, that any answers use code or pseudocode in preference to math.

推荐答案

我提出了一个新的答案,因为它根本不同.

I made this a new answer since it's fundamentally different.

这是基于Chris Bishop,机器学习和模式识别,第2章概率分布" p71 ++和 http://en.wikipedia.org/wiki/Beta_distribution .

This is based on Chris Bishop, Machine Learning and Pattern Recognition, Chapter 2 "Probability Distributions" p71++ and http://en.wikipedia.org/wiki/Beta_distribution.

首先,我们将Beta分布拟合到给定的均值和方差,以便在参数上建立分布.然后,我们返回分布的模式,这是bernoulli变量的预期参数.

First we fit a beta distribution to the given mean and variance in order to build a distribution over the parametes. Then we return the mode of the distribution which is the expected parameter for a bernoulli variable.

def estimate(prior_mean, prior_variance, clicks, views):
  c = ((prior_mean * (1 - prior_mean)) / prior_variance - 1)
  a = prior_mean * c
  b = (1 - prior_mean) * c
  return ((a + clicks) - 1) / (a + b + views - 2)

但是,我非常肯定先验均值/方差对您不起作用,因为您会丢弃有关您拥有多少样本以及先验状况如何的信息.

However, I am quite positive that the prior mean/variance will not work for you since you throw away information about how many samples you have and how good your prior thus is.

相反:给定一组(网页,link_clicked)对,您可以计算特定链接被单击的页面数.让它成为m.让该链接未被点击的次数为l.

Instead: Given a set of (webpage, link_clicked) pairs, you can calculate the number of pages a specific link was clicked on. Let that be m. Let the amount of times that link was not clicked be l.

现在,将对新链接的点击次数设为a,将对网站的访问次数设为b.那么您建立新链接的概率为

Now let a be the number of clicks to your new link be a and the number of visits to the site be b. Then your probability of your new link is

def estimate(m, l, a, b):
  (m + a) / (m + l + a + b)

这看起来很琐碎,但实际上具有有效的概率基础.从实现的角度来看,您可以全局保留m和l.

Which looks pretty trivial but actually has a valid probabilistic foundation. From the implementation perspective, you can keep m and l globally.

这篇关于有效地确定用户单击超链接的可能性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆