R中的匹配算法(双向匹配,匈牙利算法) [英] Matching algorithms in R (bipartite matching, Hungarian algorithm)

查看:282
本文介绍了R中的匹配算法(双向匹配,匈牙利算法)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何在R中建立一些基本匹配过程的示例.各种编程语言中都有很多示例,但是我还没有找到R的好示例.

I wonder how to set up some example some fundamental matching procedures in R. There are many examples in various programming languages, but I have not yet found a good example for R.

假设我想让学生适应项目,我会考虑在搜索此问题时遇到的3种替代方法:

Let’s say I want to match students to projects and I would consider 3 alternative approaches which I came across when googling on this issue:

1)两方匹配的情况:我要求每个学生列出要从事的3个项目(在这3个项目中未说明任何偏好等级).

1) Bipartite matching case: I ask each student to name 3 projects to work on (without stating any preference ranking among those 3).

ID  T.1 T.2 T.3 T.4 T.5 T.6 T.7
1   1   1   1   0   0   0   0
2   0   0   0   0   1   1   1
3   0   1   1   1   0   0   0
4   0   0   0   1   1   1   0
5   1   0   1   0   1   0   0
6   0   1   0   0   0   1   1
7   0   1   1   0   1   0   0

-

d.1 <- structure(list(Student.ID = 1:7, Project.1 = c(1L, 0L, 0L, 0L, 
1L, 0L, 0L), Project.2 = c(1L, 0L, 1L, 0L, 0L, 1L, 1L), Project.3 = c(1L, 
0L, 1L, 0L, 1L, 0L, 1L), Project.4 = c(0L, 0L, 1L, 1L, 0L, 0L, 
0L), Project.5 = c(0L, 1L, 0L, 1L, 1L, 0L, 1L), Project.6 = c(0L, 
1L, 0L, 1L, 0L, 1L, 0L), Project.7 = c(0L, 1L, 0L, 0L, 0L, 1L, 
0L)), .Names = c("Student.ID", "Project.1", "Project.2", "Project.3", 
"Project.4", "Project.5", "Project.6", "Project.7"), class = "data.frame", row.names = c(NA, 
-7L))

2)匈牙利算法:我要求每个学生姓名3个项目进行作业,并在其中进行3个项目的陈述,据我所知,在这种情况下应用该算法的理由如下:等级越好降低学生的成本".

2) Hungarian algorithm: I ask each student name 3 projects to work on WITH stating a preference ranking among those 3. As far as I understood the reasoning when applying the algorithm in this case would be something like: the better the rank the lower the "costs" to the student.

ID  T.1 T.2 T.3 T.4 T.5 T.6 T.7
1   3   2   1   na  na  na  na
2   na  na  na  na  1   2   3
3   na  1   3   2   na  na  na
4   na  na  na  1   2   3   na
5   2   na  3   na  1   na  na
6   na  3   na  na  na  2   1
7   na  1   2   na  3   na  na

-

d.2 <- structure(list(Student.ID = 1:7, Project.1 = structure(c(2L, 3L, 
3L, 3L, 1L, 3L, 3L), .Label = c("2", "3", "na"), class = "factor"), 
    Project.2 = structure(c(2L, 4L, 1L, 4L, 4L, 3L, 1L), .Label = c("1", 
    "2", "3", "na"), class = "factor"), Project.3 = structure(c(1L, 
    4L, 3L, 4L, 3L, 4L, 2L), .Label = c("1", "2", "3", "na"), class = "factor"), 
    Project.4 = structure(c(3L, 3L, 2L, 1L, 3L, 3L, 3L), .Label = c("1", 
    "2", "na"), class = "factor"), Project.5 = structure(c(4L, 
    1L, 4L, 2L, 1L, 4L, 3L), .Label = c("1", "2", "3", "na"), class = "factor"), 
    Project.6 = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 3L), .Label = c("2", 
    "3", "na"), class = "factor"), Project.7 = structure(c(3L, 
    2L, 3L, 3L, 3L, 1L, 3L), .Label = c("1", "3", "na"), class = "factor")), .Names = c("Student.ID", 
"Project.1", "Project.2", "Project.3", "Project.4", "Project.5", 
"Project.6", "Project.7"), class = "data.frame", row.names = c(NA, 
-7L))

3)???方法:这应该与(2)有很大关系.但是,我认为这可能是一种更好/更公平的方法(至少在示例中是这样).学生无法选择项目,他们甚至都不知道项目,但是他们对某项技能有等级的评定(1个不存在"至10个专业水平").此外,讲师已经为每个项目评估了所需的技能.除(2)之外,第一步是计算相似度矩阵,然后从上方运行优化例程.

3) ??? approach: This should be pretty much related to (2). However, I think it is probably a better/ fairer approach (at least in the setting of the example). The students cannot pick projects, they even don’t know about the projects, but they have rate their qualifications (1 "not existent" to 10 "professional level") with regards to a certain skillset. Further, the lecturer has rated the required skillset for every project. In addition to (2), a first step would be to calculate a similarity matrix and then to run the optimization routine from above.

PS: Programming Skills
SK: Statistical Knowledge
IE: Industry Experience

ID  PS  SK  IE
1   10  9   8
2   1   2   10
3   10  2   5
4   2   5   3
5   10  2   10
6   1   10  1
7   5   5   5

-

d.3a <- structure(list(Student.ID = 1:7, Programming.Skills = c(10L, 1L, 
10L, 2L, 10L, 1L, 5L), Statistical.knowlegde = c(9L, 2L, 2L, 
5L, 2L, 10L, 5L), Industry.Expertise = c(8L, 10L, 5L, 3L, 10L, 
1L, 5L)), .Names = c("Student.ID", "Programming.Skills", "Statistical.knowlegde", 
"Industry.Expertise"), class = "data.frame", row.names = c(NA, 
-7L))

-

T: Topic ID
PS: Programming Skills
SK: Statistical Knowledge
IE: Industry Experience

T  PS   SK  IE
1   10  5   1
2   1   1   5
3   10  10  10
4   2   8   3
5   4   3   2
6   1   1   1
7   5   7   2

-

d.3b <- structure(list(Project.ID = 1:7, Programming.Skills = c(10L, 
1L, 10L, 2L, 4L, 1L, 5L), Statistical.Knowlegde = c(5L, 1L, 10L, 
8L, 3L, 1L, 7L), Industry.Expertise = c(1L, 5L, 10L, 3L, 2L, 
1L, 2L)), .Names = c("Project.ID", "Programming.Skills", "Statistical.Knowlegde", 
"Industry.Expertise"), class = "data.frame", row.names = c(NA, 
-7L))

对于在R中实现这三种方法的任何帮助,我将不胜感激.

I would appreciate any help in implementing those 3 approaches in R. Thank you.

更新: 以下问题似乎是相关的,但没有一个问题说明如何在R中解决它: https://math.stackexchange.com/questions/132829/group-偏好分配成员资格优化问题 https://superuser.com/questions/467577/using-optimization-to-assign-按喜好

UPDATE: The following questions seem to be related, but none show how to solve it in R: https://math.stackexchange.com/questions/132829/group-membership-assignment-by-preferences-optimization-problem https://superuser.com/questions/467577/using-optimization-to-assign-by-preference

推荐答案

以下是使用二分匹配和匈牙利算法的可能解决方案.

Here are possible solutions using bipartite matching and the Hungarian algorithm.

我提出的使用二分匹配的解决方案可能与您的想法不符.下面的所有代码都是在指定的迭代次数内随机抽样,然后希望至少已找到一种解决方案.这可能需要大量的迭代,并且需要很长时间才能解决大问题.下面的代码在200次迭代中为您的示例问题找到了三种解决方案.

My proposed solution using bipartite matching might not be what you have in mind. All the code below does is sample randomly for a specified number of iterations, after which at least one solution hopefully will have been identified. This might require a large number of iterations and a long time with large problems. The code below found three solutions to your example problem within 200 iterations.

matrix1 <- matrix(c( 1,   1,   1,  NA,  NA,  NA,  NA,
                    NA,  NA,  NA,  NA,   1,   1,   1,
                    NA,   1,   1,   1,  NA,  NA,  NA,
                    NA,  NA,  NA,   1,   1,   1,  NA,
                     1,  NA,   1,  NA,   1,  NA,  NA,
                    NA,   1,  NA,  NA,  NA,   1,   1,
                    NA,   1,   1,  NA,   1,  NA,  NA), nrow=7, byrow=TRUE)

set.seed(1234)

iters <- 200

my.match <- matrix(NA, nrow=iters, ncol=ncol(matrix1))

for(i in 1:iters) {

     for(j in 1:nrow(matrix1)) {

          my.match[i,j] <- sample(which(matrix1[j,] == 1), 1)

     }
}

n.unique <- apply(my.match, 1, function(x) length(unique(x)))

my.match[n.unique==ncol(matrix1),]

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    3    7    4    6    1    2    5
# [2,]    1    7    4    5    3    6    2
# [3,]    3    5    4    6    1    7    2

这是匈牙利算法的代码,使用@jackStinger建议的软件包cluesolve_LSAP().为此,我不得不替换掉丢失的观察值,然后将其随意替换为4.第5个人没有得到他们的第一选择,第7个人没有得到他们的三个选择中的任何一个.

Here is code for the Hungarian algorithm using package clue and solve_LSAP() as @jackStinger suggested. For this to work I had to replace the missing observations and I arbitrarily replaced them with 4. Person 5 did not get their first choice and Person 7 did not get any of their three choices.

library(clue)

matrix1 <- matrix(c( 3,   2,   1,   4,   4,   4,   4,
                     4,   4,   4,   4,   1,   2,   3,
                     4,   1,   3,   2,   4,   4,   4,
                     4,   4,   4,   1,   2,   3,   4,
                     2,   4,   3,   4,   1,   4,   4,
                     4,   3,   4,   4,   4,   2,   1,
                     4,   1,   2,   4,   3,   4,   4), nrow=7, byrow=TRUE)

matrix1

solve_LSAP(matrix1, maximum = FALSE)

# Optimal assignment:
# 1 => 3, 2 => 5, 3 => 2, 4 => 4, 5 => 1, 6 => 7, 7 => 6

此处是指向显示匈牙利算法工作原理的网站的链接: http://www.wikihow.com/使用匈牙利语算法

Here is a link to a site showing how the Hungarian algorithm works: http://www.wikihow.com/Use-the-Hungarian-Algorithm

2014年6月5日

这是我优化第三种情况的第一步.我将每个学生随机分配到一个项目,然后计算该组作业的成本.通过查找学生的技能与项目所需技能之间的差异来计算成本.将这些差异的绝对值相加,得出七个作业的总费用.

Here is my first stab at optimizing the third scenario. I randomly assign each student to a project, then calculate the cost for that set of assignments. Cost is calculated by finding the difference between a student's skill set and the project's required skills. The absolute values of those differences are summed to give a total cost for the seven assignments.

下面,我重复此过程10,000次,并确定这10,000个作业中的哪一个成本最低.

Below I repeat the process 10,000 times and identify which of those 10,000 assignments results in the lowest cost.

另一种方法是对所有可能的学生项目作业进行详尽搜索.

An alternative approach would be to do an exhaustive search of all possible student-project assignments.

您所想到的既不是随机搜索也不是穷举搜索.但是,前者将给出一个近似的最优解,而后者将给出一个精确的最优解.

Neither the random search nor the exhaustive search is likely what you had in mind. However, the former will give an approximate optimal solution and the latter would give an exact optimal solution.

稍后我可能会再次解决此问题.

I might return to this problem later.

students <- matrix(c(10,   9,   8,
                      1,   2,  10,
                     10,   2,   5,
                      2,   5,   3,
                     10,   2,  10,
                      1,  10,   1,
                      5,   5,   5), nrow=7, ncol=3, byrow=TRUE)

projects <- matrix(c(10,   5,    1,
                      1,   1,    5,
                     10,  10,   10,
                      2,   8,    3,
                      4,   3,    2,
                      1,   1,    1,
                      5,   7,    2), nrow=7, ncol=3, byrow=TRUE)

iters <- 10000

# col = student, cell = project
assignments <- matrix(NA, nrow=iters, ncol=nrow(students))

for(i in 1:iters) {
      assignments[i,1:7] <- sample(7,7,replace=FALSE)
}

cost <- matrix(NA, nrow=iters, ncol=nrow(students))

for(i in 1:iters) {

     for(j in 1:nrow(students)) {

          student <- j
          project <- assignments[i,student]

          student.cost <- rep(NA,3)

          for(k in 1:3) {     

               student.cost[k] <- abs(students[student,k] - projects[project,k])

          } 

          cost[i,j] <- sum(student.cost)

     }

}


total.costs <- rowSums(cost)

assignment.costs <- cbind(assignments, total.costs)
head(assignment.costs)

assignment.costs[assignment.costs[,8]==min(assignment.costs[,8]),]

#                    total.costs
# [1,] 3 2 1 4 5 6 7          48
# [2,] 3 2 1 6 5 4 7          48
# [3,] 3 2 1 6 5 4 7          48

# student 1, project 3, cost = 3
# student 2, project 2, cost = 6
# student 3, project 1, cost = 7
# student 4, project 4, cost = 3
# student 5, project 5, cost = 15
# student 6, project 6, cost = 9
# student 7, project 7, cost = 5

3+6+7+3+15+9+5

# [1] 48

2014年6月6日

这是详尽的搜索.只有5040种可能的方式将项目分配给这七个学生.此搜索返回四个最佳解决方案:

Here is the exhaustive search. There are only 5040 possible ways to assign projects to the seven students. This search returns four optimal solutions:

students <- matrix(c(10,   9,   8,
                      1,   2,  10,
                     10,   2,   5,
                      2,   5,   3,
                     10,   2,  10,
                      1,  10,   1,
                      5,   5,   5), nrow=7, ncol=3, byrow=TRUE)

projects <- matrix(c(10,   5,    1,
                      1,   1,    5,
                     10,  10,   10,
                      2,   8,    3,
                      4,   3,    2,
                      1,   1,    1,
                      5,   7,    2), nrow=7, ncol=3, byrow=TRUE)

library(combinat)

n <- nrow(students)

assignments <- permn(1:n)
assignments <- do.call(rbind, assignments)
dim(assignments)

# column of assignments = student
# row of assignments = iteration
# cell of assignments = project

cost <- matrix(NA, nrow=nrow(assignments), ncol=n)

for(i in 1:(nrow(assignments))) {
     for(student in 1:n) {

          project      <- assignments[i,student]
          student.cost <- rep(NA,3)

          for(k in 1:3) {     
               student.cost[k] <- abs(students[student,k] - projects[project,k])
          } 

          cost[i,student] <- sum(student.cost)
     }
}


total.costs <- rowSums(cost)

assignment.costs <- cbind(assignments, total.costs)
head(assignment.costs)

assignment.costs[assignment.costs[,(n+1)]==min(assignment.costs[,(n+1)]),]

                   total.costs
[1,] 3 2 5 4 1 6 7          48
[2,] 3 2 5 6 1 4 7          48
[3,] 3 2 1 6 5 4 7          48
[4,] 3 2 1 4 5 6 7          48

这篇关于R中的匹配算法(双向匹配,匈牙利算法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆