当确定未发生它尚未发生的事件的机率 [英] Determining the chances of an event occurring when it hasn't occurred yet

查看:167
本文介绍了当确定未发生它尚未发生的事件的机率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个用户访问我的网站在时间的 T ,然后他们可能会或可能不会点击我关心的一个特定的链接,如果他们这样做我记录自己点击的链接的事实,也是由于时间的 T 他们点击它,把这个ð

A user visits my website at time t, and they may or may not click on a particular link I care about, if they do I record the fact that they clicked the link, and also the duration since t that they clicked it, call this d.

我需要一个算法,允许我创建一个类是这样的:

I need an algorithm that allows me to create a class like this:

class ClickProbabilityEstimate {
    public void reportImpression(long id);
    public void reportClick(long id);

    public double estimateClickProbability(long id);
}

报告的点击来指示IM pression点击属于当每个IM pression获得一个唯一的 ID ,这是使用。

我需要一个算法,将返回一个概率的基础上,有多少时间已经过去因为一个IM pression报道,该IM pression将获得点击的基础上多久previous所需的点击。显然,人们所期望的,这可能会随时间而减少,如果仍然没有点击。

I need an algorithm that will return a probability, based on how much time has past since an impression was reported, that the impression will receive a click, based on how long previous clicks required. Clearly one would expect that this probability will decrease over time if there is still no click.

如果有必要,我们可以设定一个上限,超过我们认为的点击概率为0(例如,如果是因为IM pression发生一个小时,我们就可以pretty的肯定有将不会是一个点击)。

If necessary, we can set an upper-bound, beyond which we consider the click probability to be 0 (eg. if its been an hour since the impression occurred, we can be pretty sure there won't be a click).

该算法应该是空间和时间高效的,并希望使尽可能少的假设成为可能,而被雅致。易于实现也将是不错的。任何想法?

The algorithm should be both space and time efficient, and hopefully make as few assumptions as possible, while being elegant. Ease of implementation would also be nice. Any ideas?

推荐答案

假设你继续过去的即时通讯pressions和点击数据,很容易:让我们假设你有一个即时pression,以及时间< STRONG> D',因为即时通讯pression已经过去了。您可以将您的数据分成三组:

Assuming you keep data on past impressions and clicks, it's easy: let's say that you have an impression, and a time d' has passed since that impression. You can divide your data into three groups:

  1. 林$ P $并获得点击,在不到pssions D'
  2. 林$ P $并获得点击后超过pssions D'
  3. 林pressions从未收到点击
  1. Impressions which received a click in less than d'
  2. Impressions which received a click after more than d'
  3. Impressions which never received a click

显然,电流im pression不在组(1),因此消除。所需的概率是在组(2),然后将其

Clearly the current impression is not in group (1), so eliminate that. You want the probability it is in group (2), which is then

P = N2 / (N2 + N3)

其中, N2 是IM pressions第2组的数量,同样为 N3

where N2 is the number of impressions in group 2, and similarly for N3.

至于实际执行中,我首先想到的是要保持与时俱进的有序列表ð作为其确实收到的点击次数,随着次数的计数过去的即时通讯pressions它从来没有收到一个点击,仅仅IM pressions做一个二进制搜索 D'在该列表中。你会发现位置会给你 N1 ,然后 N2 是列表中减去的长度 N1

As far as actual implementation, my first thought would be to keep an ordered list of the times d for past impressions which did receive clicks, along with a count of the number of impressions which never received a click, and just do a binary search for d' in that list. The position you find will give you N1, and then N2 is the length of the list minus N1.

如果你不需要完美的粒度,你可以过去的时间存储为一个直方图来代替,即一个列表,其中包含在每个元素列表[N] ,那获得了点击后至少 N 但少于 N + 1 分钟IM pressions数量。 (或几秒钟,或任何时间间隔你喜欢)在这种情况下,你可能想保持的点击总数作为一个独立的变量,所以你可以很容易地计算 N2

If you don't need perfect granularity, you can store the past times as a histogram instead, i.e. a list that contains, in each element list[n], the number of impressions that received a click after at least n but less than n+1 minutes. (Or seconds, or whatever time interval you like) In that case you'd probably want to keep the total number of clicks as a separate variable so you can easily compute N2.

(顺便说一句,我只是做这件事,我不知道是否有标准的算法,这样的事情可能会更好)

(By the way, I just made this up, I don't know if there are standard algorithms for this sort of thing that may be better)

这篇关于当确定未发生它尚未发生的事件的机率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆