利用遗传算法建立排名, [英] Building ranking with genetic algorithm,

本文介绍了利用遗传算法建立排名,的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BIG版本后的问题:

Question after BIG edition :

我需要使用遗传算法建立排名,我有这样的数据:

I need to built a ranking using genetic algorithm, I have data like this :

P(a>b)=0.9
P(b>c)=0.7
P(c>d)=0.8
P(b>d)=0.3

现在,让我们将a,b,c,d解释为足球队的名称,而P(x>y)xy获胜的概率.我们想建立团队排名,我们缺少一些观察值P(a>d)P(a>c)由于缺少vs和d之间的比赛而丢失了. 目标是找到球队名称的顺序,以最能描述该四支球队联赛的现状.

now, lets interpret a,b,c,d as names of football teams, and P(x>y) is probability that x wins with y. We want to build ranking of teams, we lack some observations P(a>d),P(a>c) are missing due to lack of matches between a vs d and a vs c. Goal is to find ordering of team names, which the best describes current situation in that four team league.

如果我们只有4个团队而不是简单的解决方案,那么首先我们将计算四个团队的所有4!=24排序的概率,而忽略我们拥有的缺失值:

If we have only 4 teams than solution is straightforward, first we compute probabilities for all 4!=24 orderings of four teams, while ignoring missing values we have :

P(abcd)=P(a>b)P(b>c)P(c>d)P(b>d)

P(abdc)=P(a>b)P(b>c)(1-P(c>d))P(b>d)

...

P(dcba)=(1-P(a>b))(1-P(b>c))(1-P(c>d))(1-P(b>d))

,然后选择概率最高的排名.我不想使用其他任何健身功能.

and we choose the ranking with highest probability. I don't want to use any other fitness function.

我的问题:

因为n个元素的排列数目是n!所有概率的计算 对于大n(我的n约为40),不可能进行排序.我想用遗传算法解决这个问题.

As numbers of permutations of n elements is n! calculation of probabilities for all orderings is impossible for large n (my n is about 40). I want to use genetic algorithm for that problem.

突变运算符是对两个(或更多)排名元素的位置的简单切换.

Mutation operator is simple switching of places of two (or more) elements of ranking.

但是如何使两个顺序交叉?

But how to make crossover of two orderings ?

是否可以将P(abcd)解释为非对称TSP问题中路径"abcd"的成本函数,但从x到y的行进成本与从y到x的行进成本不同,P(x>y)=1-P(y<x)? TSP问题有很多交叉运算符,但是我认为我必须设计自己的交叉运算符,因为我的问题与TSP略有不同.您对解决方案或概念分析框架有任何想法吗?

Could P(abcd) be interpreted as cost function of path 'abcd' in assymetric TSP problem but cost of travelling from x to y is different than cost of travelling from y to x, P(x>y)=1-P(y<x) ? There are so many crossover operators for TSP problem, but I think I have to design my own crossover operator, because my problem is slightly different from TSP. Do you have any ideas for solution or frame for conceptual analysis ?

在概念和实现级别上,最简单的方法是使用交叉运算符,该运算符可以在两个解决方案之间交换子顺序:

The easiest way, on conceptual and implementation level, is to use crossover operator which make exchange of suborderings between two solutions :

CrossOver(ABcD,AcDB) = AcBD

对于元素的随机子集(在本例中为大写字母'a,b,d'),我们复制并粘贴第一个子顺序-元素'a,b,d'的序列为第二个顺序.

for random subset of elements (in this case 'a,b,d' in capital letters) we copy and paste first subordering - sequence of elements 'a,b,d' to second ordering.

版本:不对称的TSP可以转变为对称的TSP,但是具有禁止的子顺序,这使得GA方法不合适.

Edition : asymetric TSP could be turned into symmetric TSP, but with forbidden suborderings, which make GA approach unsuitable.

推荐答案

这绝对是一个有趣的问题,似乎大多数答案和注释都集中在问题的语义方面(即适应度函数的含义)等).

It's definitely an interesting problem, and it seems most of the answers and comments have focused on the semantic aspects of the problem (i.e., the meaning of the fitness function, etc.).

我将介绍一些有关语法元素的信息-如何以有意义的方式进行交叉和/或变异.显然,正如您在与TSP的并行处理中所指出的那样,您遇到了置换问题.因此,如果您要使用GA,则候选解决方案的自然表示形式只是您的点的有序列表,请注意避免重新排列-即排列.

I'll chip in some information about the syntactic elements -- how do you do crossover and/or mutation in ways that make sense. Obviously, as you noted with the parallel to the TSP, you have a permutation problem. So if you want to use a GA, the natural representation of candidate solutions is simply an ordered list of your points, careful to avoid repitition -- that is, a permutation.

TSP就是这样的排列问题,并且有许多交叉运算符(例如,边缘装配交叉),您可以从TSP算法中获取并直接使用.但是,我认为您会遇到这种方法的问题.基本上,问题是这样的:在TSP中,解决方案的重要质量是邻接.也就是说, abcd 具有与 cdab 相同的适应性,因为它是相同的游览,只是在不同的城市开始和结束.在您的示例中,绝对位置比相对位置的概念更为重要. abcd 在某种意义上意味着 a 是最佳点-重要的是,它排在列表的首位.

TSP is one such permutation problem, and there are a number of crossover operators (e.g., Edge Assembly Crossover) that you can take from TSP algorithms and use directly. However, I think you'll have problems with that approach. Basically, the problem is this: in TSP, the important quality of solutions is adjacency. That is, abcd has the same fitness as cdab, because it's the same tour, just starting and ending at a different city. In your example, absolute position is much more important that this notion of relative position. abcd means in a sense that a is the best point -- it's important that it came first in the list.

要获得有效的交叉算子,您要做的关键事情是要考虑父级中使它们变得良好的属性,并尝试提取并准确地合并这些属性. 尼克·拉德克利夫(Nick Radcliffe)称这种尊敬的重组" (请注意,纸已经很老了,现在对这种理论的理解有所不同,但是原理是合理的.采取由TSP设计的运算符并将其应用于您的问题,最终会产生后代,以设法保留来自父母的不相关信息.

The key thing you have to do to get an effective crossover operator is to account for what the properties are in the parents that make them good, and try to extract and combine exactly those properties. Nick Radcliffe called this "respectful recombination" (note that paper is quite old, and the theory is now understood a bit differently, but the principle is sound). Taking a TSP-designed operator and applying it to your problem will end up producing offspring that try to conserve irrelevant information from the parents.

理想情况下,您需要一个尝试保留字符串中绝对位置的运算符.我所知道的最好的一种称为循环交叉(CX).我想念的是一个很好的参考,但是我可以指出您

You ideally need an operator that attempts to preserve absolute position in the string. The best one I know of offhand is known as Cycle Crossover (CX). I'm missing a good reference off the top of my head, but I can point you to some code where I implemented it as part of my graduate work. The basic idea of CX is fairly complicated to describe, and much easier to see in action. Take the following two points:

abcdefgh
cfhgedba

  1. 随机选择父级1中的起点.为简单起见,我将从0开始以"a"开头.

  1. Pick a starting point in parent 1 at random. For simplicity, I'll just start at position 0 with the "a".

现在直接放到父级2中,并观察那里的值(在本例中为"c").

Now drop straight down into parent 2, and observe the value there (in this case, "c").

现在在父级1中搜索"c".我们在位置2找到它.

Now search for "c" in parent 1. We find it at position 2.

现在再次垂直下降,并观察父级2(位置2)中的"h".

Now drop straight down again, and observe the "h" in parent 2, position 2.

再次在父级1中搜索此"h",位于位置7.

Again, search for this "h" in parent 1, found at position 7.

垂直向下拖放并观察父级2中的"a".

Drop straight down and observe the "a" in parent 2.

在这一点上,请注意,如果我们在父对象中搜索"a",则会到达一个已经存在的位置.继续过去只会循环.实际上,我们将访问过的位置序列(0、2、7)称为周期".请注意,我们可以简单地在一组父母之间交换这些位置处的值,并且父母双方都将保留置换属性,因为我们在两个周期中每个父母在每个位置处都具有相同的三个值,只是顺序不同.

At this point note that if we search for "a" in parent one, we reach a position where we've already been. Continuing past that will just cycle. In fact, we call the sequence of positions we visited (0, 2, 7) a "cycle". Note that we can simply exchange the values at these positions between the parents as a group and both parents will retain the permutation property, because we have the same three values at each position in the cycle for both parents, just in different orders.

交换周期中包括的头寸.

Make the swap of the positions included in the cycle.

请注意,这只是一个循环.然后,您每次从一个新的(未访问的)职位开始重复此过程,直到所有职位都包含在一个循环中.完成上述步骤中的一次迭代后,您将获得以下字符串(其中"X"表示循环中的值在父级之间交换的位置.

Note that this is only one cycle. You then repeat this process starting from a new (unvisited) position each time until all positions have been included in a cycle. After the one iteration described in the above steps, you get the following strings (where an "X" denotes a position in the cycle where the values were swapped between the parents.

cbhdefga
afcgedbh
X X    X

只需不断查找和交换周期,直到完成.

Just keep finding and swapping cycles until you're done.

我从github帐户链接的代码将紧密地绑定到我自己的元启发式框架上,但是我认为从代码中提取基本算法并使其适合自己的系统是一项相当容易的任务.

The code I linked from my github account is going to be tightly bound to my own metaheuristics framework, but I think it's a reasonably easy task to pull the basic algorithm out from the code and adapt it for your own system.

请注意,您可以根据自己的特定领域进行更多定制,从而从中获得很多收益.我认为像CX之类的东西比基于TSP运算符的东西更能成为更好的黑匣子算法,但是黑匣子通常是不得已的选择.别人的建议可能会使您获得更好的总体算法.

Note that you can potentially gain quite a lot from doing something more customized to your particular domain. I think something like CX will make a better black box algorithm than something based on a TSP operator, but black boxes are usually a last resort. Other people's suggestions might lead you to a better overall algorithm.

这篇关于利用遗传算法建立排名,的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆