将函数应用于数据表子集,不包括嵌套的值 [英] Applying function to data table subset excluding nested by value
问题描述
我有一个问题,连接到这一个,我先前问:从foreach循环分配值。我发现,虽然我由友好的用户提供的解决方案指向正确的方向,他们不解决我的实际问题。这里的示例数据集:
td < - data.table(date = c(rep(1,10) (2,10)),var = c(rep(1,4),2,rep(1,5)),id = rep(1:10,2))
它和以前一样,但它更好地反映了我的真实数据我想用词语:对于每个id我想有平均值(例如,平均值(td [date ==2004-01-01& id!= 1] $ var),但是对于所有周期和所有id)。所以这是一种嵌套操作。我尝试这样的:
td [,。SD [,mean(.SD $ var [ - 。I] by = id],by = date]
但这不会给出正确的结果。 p>
更新:
Josh非常智能地建议使用`.BY`而不是`.GRP`
td [,td [!。BY,mean(var),by = date],by = id]
原始答案:
如果您键入
id
,您可以按以下方式使用.GRP
:setkey(td,id)
##获取所有唯一ID。只有必要时,如果不是所有的id都是
##表示在所有日期
uid< - unique(td $ id)
td [,td [!。 GRP]),mean(var),by = date],by = id]
id日期V1
1:1 1 1.111111
2:1 2 1.111111
3:2 1 1.111111
4:2 2 1.111111
5:3 1 1.111111
6:3 2 1.111111
7:4 1 1.111111
8:4 2 1.111111
9:5 1 1.000000
10:5 2 1.000000
11:6 1 1.111111
12:6 2 1.111111
13:7 1 1.111111
14:7 2 1.111111
15:8 1 1.111111
16:8 2 1.111111
17:9 1 1.111111
18:9 2 1.111111
19 :10 1 1.111111
20:10 2 1.111111
I have a question which is connected to this one, which I asked previously: Assignment of a value from a foreach loop . I found out that although the solutions I was provided by friendly users point into the right direction they don't solve my actual problem. Here the sample data set:
td <- data.table(date=c(rep(1,10),rep(2,10)),var=c(rep(1,4),2,rep(1,5)),id=rep(1:10,2))
It is the same as before, but it reflects my real data better What I want to do in words: For each id I want to have the mean for all other ids within a certain period (e.g. mean(td[date=="2004-01-01" & id!=1]$var) but that for all periods and all ids). So it is some kind of nested operation. I tried something like that:
td[,.SD[,mean(.SD$var[-.I]),by=id],by=date]
But that doesn't give the right results.
解决方案Update:
Josh very intelligently suggested to use `.BY ` instead of `.GRP` td[, td[!.BY, mean(var), by=date], by=id]
Original answer:
If you key by
id
you can use.GRP
in the following way:setkey(td, id) ## grab all the unique IDs. Only necessary if not all ids are ## represented in all dates uid <- unique(td$id) td[, td[!.(uid[.GRP]), mean(var), by=date] , by=id] id date V1 1: 1 1 1.111111 2: 1 2 1.111111 3: 2 1 1.111111 4: 2 2 1.111111 5: 3 1 1.111111 6: 3 2 1.111111 7: 4 1 1.111111 8: 4 2 1.111111 9: 5 1 1.000000 10: 5 2 1.000000 11: 6 1 1.111111 12: 6 2 1.111111 13: 7 1 1.111111 14: 7 2 1.111111 15: 8 1 1.111111 16: 8 2 1.111111 17: 9 1 1.111111 18: 9 2 1.111111 19: 10 1 1.111111 20: 10 2 1.111111
这篇关于将函数应用于数据表子集,不包括嵌套的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!