将函数应用于数据表子集,不包括嵌套的值 [英] Applying function to data table subset excluding nested by value

查看:132
本文介绍了将函数应用于数据表子集,不包括嵌套的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,连接到这一个,我先前问:从foreach循环分配值。我发现,虽然我由友好的用户提供的解决方案指向正确的方向,他们不解决我的实际问题。这里的示例数据集:

  td < -  data.table(date = c(rep(1,10) (2,10)),var = c(rep(1,4),2,rep(1,5)),id = rep(1:10,2))

它和以前一样,但它更好地反映了我的真实数据我想用词语:对于每个id我想有平均值(例如,平均值(td [date ==2004-01-01& id!= 1] $ var),但是对于所有周期和所有id)。所以这是一种嵌套操作。我尝试这样的:

  td [,。SD [,mean(.SD $ var [ - 。I] by = id],by = date] 

但这不会给出正确的结果。 p>

解决方案

更新:



  Josh非常智能地建议使用`.BY`而不是`.GRP` 

td [,td [!。BY,mean(var),by = date],by = id]






原始答案:



如果您键入 id ,您可以按以下方式使用 .GRP

  setkey(td,id)

##获取所有唯一ID。只有必要时,如果不是所有的id都是
##表示在所有日期
uid< - unique(td $ id)

td [,td [!。 GRP]),mean(var),by = date],by = id]


id日期V1
1:1 1 1.111111
2:1 2 1.111111
3:2 1 1.111111
4:2 2 1.111111
5:3 1 1.111111
6:3 2 1.111111
7:4 1 1.111111
8:4 2 1.111111
9:5 1 1.000000
10:5 2 1.000000
11:6 1 1.111111
12:6 2 1.111111
13:7 1 1.111111
14:7 2 1.111111
15:8 1 1.111111
16:8 2 1.111111
17:9 1 1.111111
18:9 2 1.111111
19 :10 1 1.111111
20:10 2 1.111111


I have a question which is connected to this one, which I asked previously: Assignment of a value from a foreach loop . I found out that although the solutions I was provided by friendly users point into the right direction they don't solve my actual problem. Here the sample data set:

td <- data.table(date=c(rep(1,10),rep(2,10)),var=c(rep(1,4),2,rep(1,5)),id=rep(1:10,2))

It is the same as before, but it reflects my real data better What I want to do in words: For each id I want to have the mean for all other ids within a certain period (e.g. mean(td[date=="2004-01-01" & id!=1]$var) but that for all periods and all ids). So it is some kind of nested operation. I tried something like that:

td[,.SD[,mean(.SD$var[-.I]),by=id],by=date]

But that doesn't give the right results.

解决方案

Update:

 Josh very intelligently suggested to use `.BY ` instead of `.GRP`

td[, td[!.BY, mean(var), by=date], by=id]


Original answer:

If you key by id you can use .GRP in the following way:

setkey(td, id)

## grab all the unique IDs. Only necessary if not all ids are 
##     represented in all dates
uid <- unique(td$id)

td[, td[!.(uid[.GRP]), mean(var), by=date] , by=id]


    id date       V1
 1:  1    1 1.111111
 2:  1    2 1.111111
 3:  2    1 1.111111
 4:  2    2 1.111111
 5:  3    1 1.111111
 6:  3    2 1.111111
 7:  4    1 1.111111
 8:  4    2 1.111111
 9:  5    1 1.000000
10:  5    2 1.000000
11:  6    1 1.111111
12:  6    2 1.111111
13:  7    1 1.111111
14:  7    2 1.111111
15:  8    1 1.111111
16:  8    2 1.111111
17:  9    1 1.111111
18:  9    2 1.111111
19: 10    1 1.111111
20: 10    2 1.111111

这篇关于将函数应用于数据表子集,不包括嵌套的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆