根据相同数据帧中其他列的条件,我想从R数据帧中的列生成8个名称组合 [英] I want to generate 8 combinations of names from a column in an R data frame based on conditions from other columns in the same data frame

查看:171
本文介绍了根据相同数据帧中其他列的条件,我想从R数据帧中的列生成8个名称组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框架,来自4个不同队伍的20名球员(每队5名球员),每名球员都从幻想草稿中分配了薪水。我想能够创造8名玩家的薪酬等于或小于10000的组合。其总分数大于x,但不包含来自同一个团队的4名或更多球员的任何组合。



这是我的数据框架如下所示:

 球员KDA LH积分薪水PPS 
4 ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692
2 ATN至高6.8 5.3 7.1 229.4 21.954 1578 1.3913
1 ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560
3 ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061
5 ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861
11 CL Swiftending 6.0 5.8 7.8 360.5 22.285 1606 1.3876
13 CL Pajkatt 13.3 7.5 9.3 326.8 37.248 1489 2.5015
15 CL SexyBamboe 6.3 8.5 9.3 168.0 20.660 1256 1.6449
14 CL EGM 2.8 6.0 13.5 78.8 21.988 989 2.2233
12 CL Saksa 2.5 6.5 10.5 59.8 15.898 967 1.6441
51 DBEARS Ace 7.0 3.4 6.9 195.6 23.596 1578 1.4953
31 DBEARS HesteJoe 5.4 5.4 6.1 176.7 16.927 1512 1.1195
61 DBEARS Miggel 2.8 6.8 11.0 141.8 17.818 1212 1.4701
21 DBEARS Noia 3.0 6.0 8.0 36.1 13.161 970 1.3568
41 DBEARS Ryze 2.7 4.7 6.7 74.6 12.166 937 1.2984
8 GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935
9 GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305
10 GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071
7 GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251
6 GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930

我遵循这篇文章中描述的一般概念:我想从R数据帧中的一列生成5个名称的组合,其值在一个不同的t列加起来一定数量或更少



调整代码以适应我的需要。这是我到目前为止:

  ##列出了玩家,积分和工资8的所有组合
xx< - 与(FantasyPlayers,lapply(list(as.character(Player),Points,Salary),combn,8))
##将名称转换为字符串
##找到其他人的列总和,
##设置名称
yy< - setNames(
lapply(xx,function(x){
if(typeof(x) ==character)apply(x,2,toString)else colSums(x)
}),
名称(FantasyPlayers)[c(2,7,8)]

## cerce to data.frame
newdf< - as.data.frame(yy)

使用上面的代码,我可以生成8个玩家的所有可能的阵容,然后通过各种标准(总薪水和积分数)将其分组,但是当我们排除阵容中的阵容时,我很挣扎超过3名来自同一支队伍的球员。



我想像,阵容需要从newdf排除,但我真的不知道从哪里开始



这是dput结果:

 结构(列表(Team = c(ATNATN,ATN,ATN,ATN,CL,
CL,CL,CL ,DBEARS,DBEARS,DBEARS,DBEARS,
DBEARS,GB,GB,GB,GB,GB (c(2L,
5L,4L,1L,3L,15L,12L,14L,11L,13L,16L,18L,19L,20L,
21L,6L,7L,10L,8L,9L ),.Label = c(eL lisasH 2,ExoticDeer,
Nisha,sasu,Supreme,Keyser Soze,Madara,MNT,SKANKS224
SkyLark,EGM,Pajkatt,Saksa,SexyBamboe,Swiftending,
Ace,DruidzOzoneShoc,HesteJoe,Miggel ,Ryze
),class =factor),K = c(6.1,6.8,3.6,2.6,2.7,6,13.3,
6.3,2.8,2.5,7,5.4, 2.8,3,2.7,6,5.4,1.8,2.3,1.4),D = c(3.3,
5.3,6.4,6.1,5.6,5.8,7.5,8.5,6,5.5,3.4,5.4,6.8 ,6,
4.7,5,5.3,5.3,5.9,7.6),A = c(6.4,7.1,11,7.9,7.5,7.8,
9.3,9.3,13.5,10.5,6.9, 6.1,11,8,6.7,5.6,6.6,7, 6.1,
7.4),LH = c(306.9,229.4,95.7,29.7,48.2,360.5,326.8,168,
78.8,59.8,195.6,177.71,148.8,36.1,74.6,316,334.5 ,71.8,
85.6,52.5),积分= c(22.209,21.954,19.357,12.037,12.282,
22.285,37.248,20.66,21.988,15.898,23.596,16.927,17.818,
13.161,12.166,19.12,19.405,10.218,9.316,7.565),薪资= c(1622,
1578,1244,998,955,1606,1489,1256,989,967,1578,1512,$ b $ PPS = c(1.3692,
1.3913,1.556,1.2061,1.2861,1.3876,2.5015,1.6449,2.2233,
1.6441,b1212,970,937,1602,1577,1266,1007,954) (Team,Player,K,D,$ b $(Team,Player,K,D,
A,LH,积分,工资,PPS),class =data.frame,row.names = c(4,
2 ,3,5,11,13,15,14,12,51,31,
61 41,8,9,10,7,6))


解决方案

这是一种方式:

  splt.names<  -  strsplit(as.character(newdf $ Player),,)
索引< - lapply (splt.names,function(x)match(x,FantasyPlayers $ Player))
exclude< - lapply(indices,function(x)any(table(FantasyPlayers $ Team [x])> 3))
newdf2< - newdf [!unlist(exclude),]

用逗号分割 Player 列。然后将播放器名称与 Fantasy Players 播放器名称列匹配。使用这些索引,我们可以做主要工作是 any(table(FantasyPlayers $ Team [x])> 3)。这是对超过三分的球队数量的检查,这将显示来自同一个队伍的3名或更多球员。


I have a data frame with 20 players from 4 different teams (5 players per team), each assigned a salary from a fantasy draft. I would like to be able to create all combinations of 8 players whose salaries are equal to or less than 10000 & whose total points are greater than x but excluding any combinations that contains 4 or more players from the same team.

Here is what my data frame looks like:

       Team      Player    K   D    A    LH Points Salary    PPS
  4     ATN  ExoticDeer  6.1 3.3  6.4 306.9 22.209   1622 1.3692
  2     ATN     Supreme  6.8 5.3  7.1 229.4 21.954   1578 1.3913
  1     ATN        sasu  3.6 6.4 11.0  95.7 19.357   1244 1.5560
  3     ATN eL lisasH 2  2.6 6.1  7.9  29.7 12.037    998 1.2061
  5     ATN       Nisha  2.7 5.6  7.5  48.2 12.282    955 1.2861
  11     CL Swiftending  6.0 5.8  7.8 360.5 22.285   1606 1.3876
  13     CL     Pajkatt 13.3 7.5  9.3 326.8 37.248   1489 2.5015
  15     CL  SexyBamboe  6.3 8.5  9.3 168.0 20.660   1256 1.6449
  14     CL         EGM  2.8 6.0 13.5  78.8 21.988    989 2.2233
  12     CL       Saksa  2.5 6.5 10.5  59.8 15.898    967 1.6441
  51 DBEARS         Ace  7.0 3.4  6.9 195.6 23.596   1578 1.4953
  31 DBEARS    HesteJoe  5.4 5.4  6.1 176.7 16.927   1512 1.1195
  61 DBEARS      Miggel  2.8 6.8 11.0 141.8 17.818   1212 1.4701
  21 DBEARS        Noia  3.0 6.0  8.0  36.1 13.161    970 1.3568
  41 DBEARS        Ryze  2.7 4.7  6.7  74.6 12.166    937 1.2984
  8      GB Keyser Soze  6.0 5.0  5.6 316.0 19.120   1602 1.1935
  9      GB      Madara  5.4 5.3  6.6 334.5 19.405   1577 1.2305
  10     GB     SkyLark  1.8 5.3  7.0  71.8 10.218   1266 0.8071
  7      GB         MNT  2.3 5.9  6.1  85.6  9.316   1007 0.9251
  6      GB   SKANKS224  1.4 7.6  7.4  52.5  7.565    954 0.7930

I am following the general concept described in this post: I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less

tweaking the code to suit my needs. This is what I have so far:

## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn,     8))
## convert the names to a string, 
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
    if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)

Using the above code I am able to generate all possibly lineups of 8 players and then subset that by various criteria (total salary and number of points), but I am struggling when it comes to excluding the lineups where there are more than 3 players from the same team.

I imagine the lineups would need to be excluded from newdf but I don't really know where to begin in doing that.

Here are the dput results:

structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL", 
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS", 
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L, 
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L, 
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer", 
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224", 
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending", 
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3, 
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3, 
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6, 
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8, 
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1, 
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168, 
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8, 
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282, 
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818, 
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622, 
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512, 
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692, 
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233, 
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305, 
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D", 
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4", 
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31", 
"61", "21", "41", "8", "9", "10", "7", "6"))

解决方案

Here's one way:

splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]

First split the Player column by comma. Then match the player names to the Fantasy Players player name column. With those indices, we can do the main work which is any(table(FantasyPlayers$Team[x]) > 3). This is the check of team counts that exceed three, which will indicate 3 or more players from the same team.

这篇关于根据相同数据帧中其他列的条件,我想从R数据帧中的列生成8个名称组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆