具有多个逻辑条件的组,省略sum R data.table中的值 [英] Sum by group with multiple logical conditions while omitting values from sum R data.table

查看:156
本文介绍了具有多个逻辑条件的组,省略sum R data.table中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法弄清楚如何在 data.table 中总计行,而忽略过程中某个组的值。



假设我有一个 data.table ,形式如下:

  library(data.table)
dt< - data.table(year = c(2000,2001,2002,2003,2000,2001,2002 ,2003,2000,2001,2002,2003,2000,2001,2002,2003),
name = c(Tom,Tom,Tom,Tom,Fred ,Fred,Fred,Gill,Gill,Gill,Gill,Ann,Ann,Ann,Ann),
g1 = c (1,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1),
g2 = c(1,0,1,1,0,1,1,1,1,1) 0,0,1,1,1,0,0,1,0,1,1,1),
g3 = c(1,1,1,1,0,1,1,0,1) ,1,1,1,0,0,1,1),
g4 = c(0,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,0,1,1) 0,1,1))

setkey(dt,name,year)

其中 g1 - g4 是游戏的指示符变量,其中 / code>参加时间



我想做的是计算每个游戏的玩家人数 NPg1 - NPg4 ,其中两个玩家都参与了焦点游戏,但是只有当他们在同一年的另一场比赛中彼此对战时,这个总和才应该排除正在计算的玩家。



我使用从 NPg1

  dtg1 < -  dt [,。SD [(g1 == 1)& (g2 == 1 | g3 == 1 | g4 == 1)] [,NPg1:= sum(g1)],by = year] 

这会将我的条件下的 dt 子集,并创建总和,但是总和包括焦点玩家。例如, NPg1 年== 2000 是1为汤姆,但它应该是0, g1 他在那一年没有在另一场比赛中玩另一个球员。一旦我得到和,我可以为每个游戏做这个,并结合回一个 data.table 。主要问题是,如何获得这些条件的正确和。



NPg1 的结果应如下所示:

  dtg1 $ NPg1result <-c(0,0,0,3,3,3,3,3,3,3)


之后@ Mike.Gahan的评论:



这是 g1 的子结果,也许这不会变得很清楚我的帖子。一旦我正确地,我可以很容易地加入它回到完整的 data.table 使用:

  library(plyr)
dt< - join(dt,dtg1)

或其他合并/连接操作,但是因为我的问题主要是关于子结果,我不想打扰其他人。



在@ Ricardo Saportas解决方案后编辑



游戏如下:

  dtresult<  -  data.table(year = c(2000,2001,2002,2003,2000 ,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003),
name = c(Ann,Ann,Ann,Ann,Fred ,汤姆,汤姆,汤姆),$ b $,ill,ill,ill b NPg1 = c(0,1,3,3,0,0,3,3,0,0,3,3,0,1,3,3),
NPg2 = c(0,0, 2,3,0,0,3,1,0,2,3),
NPg3 = c(0,0,3,2,0,2,3) ,0,1,2,3,2,1,2,3,2),
NPg4 = c(0,0,2,2,0,1,0,0,0,1,2, 2,0,0,2,2))


解决方案

一种方法是在 year-g1-g2 -..- gn 组合上进行笛卡尔联接。



然后,在新表格上,您可以忽略行 [请参见下面的注释]

  setkeyv(dt,c(year,games ))
dt.merged< - merge(dt,dt,all = TRUE,allow.cartesian = TRUE,suffixes = c(,.y))
##忽略玩家对自己
dt.merged [name!= name.y,(games):= 0]
##忽略只共享一个游戏的玩家组合
dt.merged [(rowSums(dt .merged [,games,with = FALSE])<= 1),(games):= 0]
##现在只是总和
结果< - dt.merged [,lapply SD,sum),keyby = list(year,name),.SDcols = games]
##清理名称
setnames(results,games,paste0 $ b

产生

  results 

年份名称g1 g2 g3 g4
1:2000 Ann 0 0 0 0
2:2000 Fred 0 0 0 0
3:2000 Gill 0 1 1 1
4:2000 Tom 1 1 1 0
5:2001 Ann 1 1 0 0
6:2001 Fred 0 0 1 1
7:2001 Gill 0 0 1 1
8:2001 Tom 1 0 1 0
9:2002 Ann 1 1 1 1
10:2002 Fred 1 1 1 0
11:2002 Gill 1 0 1 1
12:2002 Tom 1 1 1 1
13:2003 Ann 1 1 1 1
14:2003 Fred 1 1 0 0
15:2003 Gill 1 1 1 1
16:2003 Tom 1 1 1 1



请注意,您有两个选项



如果您要保留年度玩家的0计数,请使用

  dt.merged [< filter>,(games):= 0] 

如果您不关心年玩家的0计数,请使用

  dt.merged< ;  -  dt.merged [! < filter> ] 


I am having trouble figuring out how to sum rows in a data.table while omitting the values of a certain group in the process.

Let's say I have a data.table of the following form:

library(data.table)
dt <- data.table(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003), 
               name = c("Tom", "Tom", "Tom", "Tom", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Ann", "Ann", "Ann", "Ann"),
               g1 = c(1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1),
               g2 = c(1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1),
               g3 = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1),
               g4 = c(0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1))

setkey(dt, name, year)

where g1-g4 are indicator variables for games in which the players in name participated at time year.

What I want to do is to calculate the number of players for each game NPg1-NPg4 in which both players participated in the focal game, but only if they also played against each other in another game in the same year and this sum should exclude the player for whom it is being calculated.

I get close using this code modified from how to cumulatively add values in one vector in R e.g for NPg1:

dtg1 <- dt[,.SD[(g1==1) & (g2==1 | g3==1 | g4==1)][, NPg1:= sum(g1)], by=year]

This subsets the dt on my conditions and creates the sum, however, the sum includes the focal players. For example NPg1 in year==2000 is 1 for Tom, but it should be 0 because even though he played in g1 he did not play another player in another game in that year. Once I get the sum right, I can then do this for each game and combine the results back into a data.table. The main question is, how can I get the correct sum with these conditions.

The result for NPg1 should look like this

dtg1$NPg1result <- c(0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3)

Any ideas would be greatly appreciated.

After @Mike.Gahan's comment:

This is the sub-result for g1, maybe this does not become very clear form my post. Once I have that correctly I could easily join it back to the full data.table using:

library(plyr)
dt <- join(dt, dtg1)

or other merge/join operations but since my question is mainly concerned with the sub-result I did not want to bother everyone with the rest.

EDIT after @ Ricardo Saportas solution

The full desired result with all the games looks as follows:

dtresult <- data.table(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003), 
                   name = c("Ann", "Ann", "Ann", "Ann", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Tom", "Tom", "Tom", "Tom"), 
                   NPg1 = c(0, 1, 3, 3, 0, 0, 3, 3, 0, 0, 3, 3, 0, 1, 3, 3), 
                   NPg2 = c(0, 0, 2, 3, 0, 0, 2, 3, 1, 0, 0, 3, 1, 0, 2, 3), 
                   NPg3 = c(0, 0, 3, 2, 0, 2, 3, 0, 1, 2, 3, 2, 1, 2, 3, 2), 
                   NPg4 = c(0, 0, 2, 2, 0, 1, 0, 0, 0, 1, 2, 2, 0, 0, 2, 2))

解决方案

One approach is to do a cartesian join on the year-g1-g2-..-gn combinations.

Then on the new table, you can "ignore the rows" [see note at bottom] that do not qualify -- namely, players playing against themselves, and those player-combinations that only played one game.

setkeyv(dt, c("year", games))
dt.merged <- merge(dt, dt, all=TRUE, allow.cartesian=TRUE, suffixes=c("", ".y"))
## ignore players playing against themselves
dt.merged[name != name.y, (games) := 0 ]
## ignore player combinations that only shared one game
dt.merged[ (rowSums(dt.merged[, games, with=FALSE]) <= 1) , (games) := 0 ]
## now just sum itup
results <- dt.merged[, lapply(.SD, sum), keyby=list(year, name), .SDcols=games]
## clean up the names
setnames(results, games, paste0("NP", games))

Which yields

results

    year name g1 g2 g3 g4
 1: 2000  Ann  0  0  0  0
 2: 2000 Fred  0  0  0  0
 3: 2000 Gill  0  1  1  1
 4: 2000  Tom  1  1  1  0
 5: 2001  Ann  1  1  0  0
 6: 2001 Fred  0  0  1  1
 7: 2001 Gill  0  0  1  1
 8: 2001  Tom  1  0  1  0
 9: 2002  Ann  1  1  1  1
10: 2002 Fred  1  1  1  0
11: 2002 Gill  1  0  1  1
12: 2002  Tom  1  1  1  1
13: 2003  Ann  1  1  1  1
14: 2003 Fred  1  1  0  0
15: 2003 Gill  1  1  1  1
16: 2003  Tom  1  1  1  1

Note that you have two options to "ignore the row"

If you want to preserve the "0" count for the year-player, then use

dt.merged[ <filter>,  (games) := 0 ]

If you do not care for the "0" count for the year-player, then use

dt.merged <- dt.merged[ ! <filter> ]

这篇关于具有多个逻辑条件的组,省略sum R data.table中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆