在给定完整历史记录的情况下,计算球队赢得运动比赛赔率的算法 [英] Algorithm to calculate the odds of a team winning a sports match given full history

查看:67
本文介绍了在给定完整历史记录的情况下,计算球队赢得运动比赛赔率的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设:

  • 球队永远不会改变
  • 团队的技能没有提高
  • 每个团队对其他团队的表现的整个历史都是已知的
  • 球队之间的比赛数量虽然很大,但可能很稀疏(每个球队没有互相比赛)

例如:

我有一长串如下所示的比赛结果:

I have a long list of match outcomes that look like this:

Team A beats Team B
Team B beats Team A
Team A beats Team B
Team C beats Team A
Team A beats Team C

问题:

预测任何一支球队击败其他任何一支球队的正确赔率.

Predict the correct betting odds of any team beating any other team.

在上面的示例中,也许我们得出结论:A应该在66%的时间内击败B.这是基于直接观察,非常简单.但是,要找到C击败B的可能性似乎很难.他们从未一起玩过,但似乎C> B的信心很低.

In the example above, maybe we conclude that A should beat B 66% of the time. That is based off direct observation and is pretty straightforward. However, finding the probability that C beats B seems harder. They've never played together, yet it seems like most likely that C > B, with some low confidence.

我完成的研究

我对技巧游戏的不同排名系统有相当的了解,例如国际象棋的Elo和Glicko评分系统.这些不足,因为它们对所涉及的概率分布进行了假设.例如,埃洛(Elo)的主要假设是,每个游戏中每个玩家的象棋表现都是正态分布的随机变量.但是,根据维基百科,还有其他分布可以更好地适合现有数据.

I've read a fair bit about different ranking systems for games of skill, such as the Elo and Glicko rating systems for Chess. These fall short because they make assumptions about the probability distributions involved. For example, Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. However, according to wikipedia, there are other distributions that fit the existing data better.

我不想假设分配.在我看来,手头有10,000多个匹配结果,我应该能够从证据中推断分布(我不知道该怎么做),或者使用某种无关紧要的强化学习方案什么是分布.

I don't want to assume a distribution. It seems to me that with 10,000+ match results on hand that I should be able to either deduce the distribution from the evidence (I don't know how to do this), or use some sort of reinforcement learning scheme that doesn't care what the distribution is.

推荐答案

您希望对一个或多个概率进行最佳估计,并在有更多数据可用时不断更新估计.这就需要贝叶斯推断!贝叶斯推理基于这样的观察:假设B是个案乘以概率,则两件事物同时发生的概率A和B等于A的概率(分布).B是这种情况.公式形式:

You want to make a best estimate of a probability (or multiple probabilities) and continuously update your estimate as more data become available. That calls for Bayesian inference! Bayesian reasoning is based on the observation that the probability (distribution) of two things, A and B, being the case at the same time is equal to the probability (distribution) of A being the case given that B is the case times the probability that B is the case. In formula form:

P(A,B)= P(A | B)P(B)

P(A,B) = P(A|B)P(B)

还有

P(A,B)= P(B | A)P(A)

P(A,B) = P(B|A)P(A)

因此

P(A | B)P(B)= P(B | A)P(A)

P(A|B)P(B) = P(B|A)P(A)

将P(B)移到另一边,我们得到贝叶斯更新规则:

Take P(B) to the other side and we get the Bayesian update rule:

P(A | B)'= P(B | A)P(A)/P(B)

P(A|B)' = P(B|A)P(A)/P(B)

通常,A代表您要估算的任何变量(例如,团队x击败团队y"),而B代表您的观察结果(例如,团队之间获胜和输掉的比赛的完整历史记录).我写了质数(即 P(A | B)'中的引号)以表示等式的左手代表您的信念的更新.具体来说,您 new 根据到目前为止的所有观察结果估算的团队x击败团队y的概率,即给出的观察结果的概率您之前的估算值乘以您的先前的估算值除以查看您所看到的观察结果的总概率(即,不考虑团队之间相对实力的假设;一个团队在大多数情况下获胜)比两支球队获胜的几率均低).

Usually A stands for whatever variable you are trying to estimate (e.g. "team x beats team y") while B stands for your observations (e.g. the full history of matches won and lost between teams). I wrote the prime (i.e. the quote in P(A|B)') to signify that the left hand of the equation represents an update of your beliefs. To make it concrete, your new estimate of the probability that team x will beat team y, given all observations so far, is the probability of doing those observations given your previous estimate, times your previous estimate, divided by the overall probability of seeing the observations you have seen (i.e. given no assumptions about relative strength between teams; one team winning most of the time is less likely than both teams winning about equally often).

当前更新的左手边的P(A | B)'成为下一个更新的右手边的新P(A).您只要不断输入更多数据就可以重复进行此操作.通常,为了尽可能保持不偏不倚,您可以从P(A)的完全平坦分布开始.随着时间的流逝,P(A)将变得越来越确定,尽管该算法能够很好地处理您要估算的潜在概率的突然变化(例如,如果团队x由于新玩家加入而突然变得强大得多)团队).

The P(A|B)' from the left hand of the current update becomes the new P(A) on the right hand of the next update. You just keep repeating this as more data come in. Typically, in order to be as unbiased as possible, you start with a completely flat distribution for P(A). Over time P(A) will become more and more certain, although the algorithm is fairly well able to deal with sudden changes of the underlying probability that you're trying to estimate (e.g. if team x suddenly becomes much stronger because a new player joins the team).

好消息是,贝叶斯推断与 beta分布一起很好地工作了.实际上,两者通常在旨在了解概率分布的人工智能系统中结合在一起.尽管beta分布本身仍是一个假设,但它具有可以采用多种形式(包括完全平坦和极度尖峰)的优点,因此,您很少有理由担心您选择的分布可能会影响您的结果.

The good news is that Bayesian inference works well with the beta distribution which ElKamina also mentioned. In fact the two are often combined in artificial intelligence systems that are meant to learn a probability distribution. While the beta distribution in itself is still an assumption, it has the advantage that it can take many forms (including completely flat and extremely spikey), so there's relatively little reason to be concerned that your choice of distribution might be affecting your outcome.

一个坏消息是,除了beta分布之外,您仍然需要进行假设.例如,假设您具有以下变量:

One piece of bad news is that you still need to make assumptions, apart from the beta distribution. For example, suppose you have the following variables:

A:x队击败y队

A: team x beats team y

B:Y队击败Z队

C:x战队击败z战队

C: team x beats team z

,您将获得x和y之间的直接匹配以及y和z之间的匹配的观察结果,而x和z之间的匹配则没有观察到的结果.估算P(C)的一种简单(虽然天真)的方法可能是假设可传递性:

and you have observations from direct matches between x and y and from matches between y and z but not from matches between x and z. A simple (though naieve) way to estimate P(C) could be to assume transitivity:

P(C)= P(A)P(B)

P(C) = P(A)P(B)

无论您的方法多么复杂,您都必须定义某种概率结构来应对数据中的差距和相互依赖性.无论选择哪种结构,都将始终是一个假设.

Regardless how sophisticated your approach, you'll have to define some kind of structure of probabilities to deal with the gaps as well as the interdependencies in your data. Whatever structure you choose, it will always be an assumption.

另一个坏消息是,这种方法非常复杂,我无法全面介绍如何将其应用于您的问题.假设您需要一个相互依存的概率结构(给定其他涉及团队x,y和z的分布,则团队x击败团队y的概率),您可能需要使用Markov随机字段路径分析).

Another piece of bad news is that this approach is plain complicated and I cannot give you a full account of how to apply it to your problem. Given that you need a structure of interdependent probabilities (probability of team x beating team y given other distributions involving teams x, y and z), you may want to use a Bayesian network or related analysis (for example a Markov random field or path analysis).

我希望这会有所帮助.无论如何,请随时进行澄清.

I hope this helps. In any case, feel free to ask for clarifications.

这篇关于在给定完整历史记录的情况下,计算球队赢得运动比赛赔率的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆