我想从R数据框中的列中生成5个名称的组合,其中不同列中的值加起来一定数量或更少 [英] I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less

查看:118
本文介绍了我想从R数据框中的列中生成5个名称的组合,其中不同列中的值加起来一定数量或更少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个4列的数据框(UFC)。



列1(UFC $ Name)是本周末战斗的UFC战士的名字。



第2列(UFC $ Salary)是在幻想体育比赛中花费多少。



列3(UFC $ WinPct)是战斗机赢得战斗的可能性。



第4列(UFC $ FinishPct)是战斗机赢得战斗的可能性没有它做决定。



我想制作一个数据框架,其中包含所有(或更实际上是顶部X数量,基于我在下一段中提到的参数)第1列的5名战士的组合,其列2的总和加起来高达$ 50,000或更少。



然后我真正感兴趣的是5列战士的组合,其4列总和最高。



我对数据框的低级修补工作变得非常好,但是对于我来说,这是一个太先进的技巧了。



这是大约30%的数据框。

 名称薪水WinPct FinishPct 
凯塔中村9100 31.00 15.36
乔治·罗普8900 33.00 15.76
Teruto石原9000 33.00 17.08
Naoyuki Kotani 8700 30.50 18.35
Yusuke Kasuya 8500 29.60 21.16
Katsunori Kikuno 8800 33.66 21.88

所需的输出如下所示:

 阵容
Roy Nelson,Gegard Mousasui,Yusuke Kasuya,George Roop,Diego Brandao
薪水
47900
FinishPctSum
148.99

它会返回这些输出的前X个数,排名最高的FinishPctSum

解决方案

这不会非常快,但这是一个想法...

  ##列出了名称,工资和FinishPct的5个组合的列表
xx< - 与(df,lapply列表(as.character(Name),Salary,FinishPct),combn,5))
##将名称转换为字符串
##查找其他人的列总和
##设置名称
yy< - setNames(
lapply(xx,function(x){
if(typeof(x)==character)apply(x, toString)else colSums(x)
}),
名称(df)[c(1,2,4)]

##强制到data.frame
newdf< - as.data.frame(yy)

导致

 #名称薪资FinishPct 
#1中村贵中,乔治·罗普,石原俊雄,Naoyuki Kotani,Yusuke Kasuya 44200 87.71
#2 Keita Nakamura,乔治·罗普,石原龙,Naoyuki Kotani,Katsunori Kikuno 44500 88.43
#3中村贵子,乔治·罗普,石原荣雄,优素彦,Katsunori Kikuno 44300 91.24
#4中田贵子,乔治·罗普,Naoyuki Kotani, Yatsuke Kasuya,Katsunori Kikuno 44000 92.51
#5 Keita Nakamura,Teruto Ishihara,Naoyuki Kotani,Yusuke Kasuya,Katsunori Kikuno 44100 93.83
#6 George Roop,Teruto Ishihara,Naoyuki Kotani,Yusuke Kasuya,Katsunori Kikuno 43900 94.23

没有执行检查来确定薪水是否小于50k。它只是给出5架战斗机的所有组合及其各自的总和。您可以通过

  newdf [newdf $ Salary <= 5e4,] 
查找少于50k的薪水

请注意,$ code> 5e4 是50,000的速记/科学记数法。 / p>

数据:

  df< - 结构(名称=结构(c(3L,1L,5L,4L,6L,2L),.Label = c(George Roop,
Katsunori Kikuno,Keita Nakamura,Naoyuki Kotani,Teruto Ishihara,
Yusuke Kasuya),class =factor),Salary = c(9100L,8900L,
9000L,8700L,8500L,8800L),WinPct = c 31.33,33,30.5,29.6,
33.66),FinishPct = c(15.36,15.76,17.08,18.35,21.16,21.88
)),.Names = c(Name,Salary ,WinPct,FinishPct),class =data.frame,row.names = c(NA,
-6L))


I have a data frame (UFC) with 4 columns.

Column 1 (UFC$Name) is names of UFC fighters fighting this weekend.

Column 2 (UFC$Salary) is how much they "cost" in a fantasy sports contest.

Column 3 (UFC$WinPct) is how likely the fighter is to win the fight.

Column 4 (UFC$FinishPct) is how likely the fighter is to win the fight without it going to a decision.

I'd like to make a data frame that contains all (or more practically the top X number of them, based on the parameter I mention in the next paragraph) the combinations of 5 fighters from column 1, whose column 2 sums add up to $50,000 or less.

Then what I'm really interested in, is the combinations of 5 Fighters whose column 4 sums are highest.

I'm getting pretty good at low level tinkering with data frames but this is a little too advanced for me to wrap my head around how to approach.

Here is about 30% of the dataframe.

              Name Salary WinPct FinishPct
    Keita Nakamura   9100  31.00     15.36
       George Roop   8900  33.00     15.76
   Teruto Ishihara   9000  33.00     17.08
    Naoyuki Kotani   8700  30.50     18.35
     Yusuke Kasuya   8500  29.60     21.16
  Katsunori Kikuno   8800  33.66     21.88

The desired output would look something like this:

Lineup                                                                       
Roy Nelson,Gegard Mousasui,Yusuke Kasuya,George Roop,Diego Brandao      
SalarySum
47900     
FinishPctSum     
148.99 

And it would return the top X number of those outputs, ranked by highest FinishPctSum

解决方案

Well this won't be terribly fast but it's an idea ...

## make a list of all combinations of 5 of Name, Salary, and FinishPct
xx <- with(df, lapply(list(as.character(Name), Salary, FinishPct), combn, 5))
## convert the names to a string, 
## find the column sums of the others,
## set the names
yy <- setNames(
    lapply(xx, function(x) {
        if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
    }),
    names(df)[c(1, 2, 4)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)

which results in

#                                                                              Names Salary FinishPct
# 1      Keita Nakamura, George Roop, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya  44200     87.71
# 2   Keita Nakamura, George Roop, Teruto Ishihara, Naoyuki Kotani, Katsunori Kikuno  44500     88.43
# 3    Keita Nakamura, George Roop, Teruto Ishihara, Yusuke Kasuya, Katsunori Kikuno  44300     91.24
# 4     Keita Nakamura, George Roop, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno  44000     92.51
# 5 Keita Nakamura, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno  44100     93.83
# 6    George Roop, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno  43900     94.23

No check has been performed to determine whether the salaries are less than 50k. It just gives all the combinations of 5 fighters with their respective sums. You can subset to find those salaries less than 50k with

newdf[newdf$Salary <= 5e4, ]

Note that 5e4 is shorthand/scientific notation for 50,000.

Data:

df <- structure(list(Name = structure(c(3L, 1L, 5L, 4L, 6L, 2L), .Label = c("George Roop", 
"Katsunori Kikuno", "Keita Nakamura", "Naoyuki Kotani", "Teruto Ishihara", 
"Yusuke Kasuya"), class = "factor"), Salary = c(9100L, 8900L, 
9000L, 8700L, 8500L, 8800L), WinPct = c(31, 33, 33, 30.5, 29.6, 
33.66), FinishPct = c(15.36, 15.76, 17.08, 18.35, 21.16, 21.88
)), .Names = c("Name", "Salary", "WinPct", "FinishPct"), class = "data.frame", row.names = c(NA, 
-6L))

这篇关于我想从R数据框中的列中生成5个名称的组合,其中不同列中的值加起来一定数量或更少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆