具有多个逻辑条件的组,省略sum R data.table中的值 [英] Sum by group with multiple logical conditions while omitting values from sum R data.table
问题描述
我无法弄清楚如何在 data.table
中总计行,而忽略过程中某个组的值。
假设我有一个 data.table
,形式如下:
library(data.table)
dt< - data.table(year = c(2000,2001,2002,2003,2000,2001,2002 ,2003,2000,2001,2002,2003,2000,2001,2002,2003),
name = c(Tom,Tom,Tom,Tom,Fred ,Fred,Fred,Gill,Gill,Gill,Gill,Ann,Ann,Ann,Ann),
g1 = c (1,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1),
g2 = c(1,0,1,1,0,1,1,1,1,1) 0,0,1,1,1,0,0,1,0,1,1,1),
g3 = c(1,1,1,1,0,1,1,0,1) ,1,1,1,0,0,1,1),
g4 = c(0,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,0,1,1) 0,1,1))
setkey(dt,name,year)
其中 g1
- g4
是游戏的指示符变量,其中 / code>参加时间
年
。
我想做的是计算每个游戏的玩家人数 NPg1
- NPg4
,其中两个玩家都参与了焦点游戏,但是只有当他们在同一年的另一场比赛中彼此对战时,这个总和才应该排除正在计算的玩家。
我使用从 NPg1 :
dtg1 < - dt [,。SD [(g1 == 1)& (g2 == 1 | g3 == 1 | g4 == 1)] [,NPg1:= sum(g1)],by = year]
这会将我的条件下的
dt
子集,并创建总和,但是总和包括焦点玩家。例如,NPg1
在年== 2000
是1为汤姆,但它应该是0,g1
他在那一年没有在另一场比赛中玩另一个球员。一旦我得到和,我可以为每个游戏做这个,并结合回一个data.table
。主要问题是,如何获得这些条件的正确和。
NPg1
的结果应如下所示:dtg1 $ NPg1result <-c(0,0,0,3,3,3,3,3,3,3)
之后@ Mike.Gahan的评论:
这是
g1
的子结果,也许这不会变得很清楚我的帖子。一旦我正确地,我可以很容易地加入它回到完整的data.table
使用:library(plyr)
dt< - join(dt,dtg1)
或其他合并/连接操作,但是因为我的问题主要是关于子结果,我不想打扰其他人。
在@ Ricardo Saportas解决方案后编辑
游戏如下:
dtresult< - data.table(year = c(2000,2001,2002,2003,2000 ,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003),
name = c(Ann,Ann,Ann,Ann,Fred ,汤姆,汤姆,汤姆),$ b $,ill,ill,ill b NPg1 = c(0,1,3,3,0,0,3,3,0,0,3,3,0,1,3,3),
NPg2 = c(0,0, 2,3,0,0,3,1,0,2,3),
NPg3 = c(0,0,3,2,0,2,3) ,0,1,2,3,2,1,2,3,2),
NPg4 = c(0,0,2,2,0,1,0,0,0,1,2, 2,0,0,2,2))
解决方案一种方法是在
year-g1-g2 -..- gn
组合上进行笛卡尔联接。
然后,在新表格上,您可以忽略行 [请参见下面的注释]
setkeyv(dt,c(year,games ))
dt.merged< - merge(dt,dt,all = TRUE,allow.cartesian = TRUE,suffixes = c(,.y))
##忽略玩家对自己
dt.merged [name!= name.y,(games):= 0]
##忽略只共享一个游戏的玩家组合
dt.merged [(rowSums(dt .merged [,games,with = FALSE])<= 1),(games):= 0]
##现在只是总和
结果< - dt.merged [,lapply SD,sum),keyby = list(year,name),.SDcols = games]
##清理名称
setnames(results,games,paste0 $ b产生
results
年份名称g1 g2 g3 g4
1:2000 Ann 0 0 0 0
2:2000 Fred 0 0 0 0
3:2000 Gill 0 1 1 1
4:2000 Tom 1 1 1 0
5:2001 Ann 1 1 0 0
6:2001 Fred 0 0 1 1
7:2001 Gill 0 0 1 1
8:2001 Tom 1 0 1 0
9:2002 Ann 1 1 1 1
10:2002 Fred 1 1 1 0
11:2002 Gill 1 0 1 1
12:2002 Tom 1 1 1 1
13:2003 Ann 1 1 1 1
14:2003 Fred 1 1 0 0
15:2003 Gill 1 1 1 1
16:2003 Tom 1 1 1 1
请注意,您有两个选项
如果您要保留年度玩家的0计数,请使用
dt.merged [< filter>,(games):= 0]
如果您不关心年玩家的0计数,请使用
dt.merged< ; - dt.merged [! < filter> ]
I am having trouble figuring out how to sum rows in a
data.table
while omitting the values of a certain group in the process.Let's say I have a
data.table
of the following form:library(data.table) dt <- data.table(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003), name = c("Tom", "Tom", "Tom", "Tom", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Ann", "Ann", "Ann", "Ann"), g1 = c(1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1), g2 = c(1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1), g3 = c(1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1), g4 = c(0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1)) setkey(dt, name, year)
where
g1
-g4
are indicator variables for games in which the players inname
participated at timeyear
.What I want to do is to calculate the number of players for each game
NPg1
-NPg4
in which both players participated in the focal game, but only if they also played against each other in another game in the same year and this sum should exclude the player for whom it is being calculated.I get close using this code modified from how to cumulatively add values in one vector in R e.g for
NPg1
:dtg1 <- dt[,.SD[(g1==1) & (g2==1 | g3==1 | g4==1)][, NPg1:= sum(g1)], by=year]
This subsets the
dt
on my conditions and creates the sum, however, the sum includes the focal players. For exampleNPg1
inyear==2000
is 1 for Tom, but it should be 0 because even though he played ing1
he did not play another player in another game in that year. Once I get the sum right, I can then do this for each game and combine the results back into adata.table
. The main question is, how can I get the correct sum with these conditions.The result for
NPg1
should look like thisdtg1$NPg1result <- c(0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3)
Any ideas would be greatly appreciated.
After @Mike.Gahan's comment:
This is the sub-result for
g1
, maybe this does not become very clear form my post. Once I have that correctly I could easily join it back to the fulldata.table
using:library(plyr) dt <- join(dt, dtg1)
or other merge/join operations but since my question is mainly concerned with the sub-result I did not want to bother everyone with the rest.
EDIT after @ Ricardo Saportas solution
The full desired result with all the games looks as follows:
dtresult <- data.table(year = c(2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003, 2000, 2001, 2002, 2003), name = c("Ann", "Ann", "Ann", "Ann", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Tom", "Tom", "Tom", "Tom"), NPg1 = c(0, 1, 3, 3, 0, 0, 3, 3, 0, 0, 3, 3, 0, 1, 3, 3), NPg2 = c(0, 0, 2, 3, 0, 0, 2, 3, 1, 0, 0, 3, 1, 0, 2, 3), NPg3 = c(0, 0, 3, 2, 0, 2, 3, 0, 1, 2, 3, 2, 1, 2, 3, 2), NPg4 = c(0, 0, 2, 2, 0, 1, 0, 0, 0, 1, 2, 2, 0, 0, 2, 2))
解决方案One approach is to do a cartesian join on the
year-g1-g2-..-gn
combinations.Then on the new table, you can "ignore the rows" [see note at bottom] that do not qualify -- namely, players playing against themselves, and those player-combinations that only played one game.
setkeyv(dt, c("year", games)) dt.merged <- merge(dt, dt, all=TRUE, allow.cartesian=TRUE, suffixes=c("", ".y")) ## ignore players playing against themselves dt.merged[name != name.y, (games) := 0 ] ## ignore player combinations that only shared one game dt.merged[ (rowSums(dt.merged[, games, with=FALSE]) <= 1) , (games) := 0 ] ## now just sum itup results <- dt.merged[, lapply(.SD, sum), keyby=list(year, name), .SDcols=games] ## clean up the names setnames(results, games, paste0("NP", games))
Which yields
results year name g1 g2 g3 g4 1: 2000 Ann 0 0 0 0 2: 2000 Fred 0 0 0 0 3: 2000 Gill 0 1 1 1 4: 2000 Tom 1 1 1 0 5: 2001 Ann 1 1 0 0 6: 2001 Fred 0 0 1 1 7: 2001 Gill 0 0 1 1 8: 2001 Tom 1 0 1 0 9: 2002 Ann 1 1 1 1 10: 2002 Fred 1 1 1 0 11: 2002 Gill 1 0 1 1 12: 2002 Tom 1 1 1 1 13: 2003 Ann 1 1 1 1 14: 2003 Fred 1 1 0 0 15: 2003 Gill 1 1 1 1 16: 2003 Tom 1 1 1 1
Note that you have two options to "ignore the row"
If you want to preserve the "0" count for the year-player, then use
dt.merged[ <filter>, (games) := 0 ]
If you do not care for the "0" count for the year-player, then use
dt.merged <- dt.merged[ ! <filter> ]
这篇关于具有多个逻辑条件的组,省略sum R data.table中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!