R动态地构建“列表”在data.table(或ddply) [英] R Dynamically build "list" in data.table (or ddply)

查看:127
本文介绍了R动态地构建“列表”在data.table(或ddply)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的聚合需求因columns / data.frames而异。我想将list参数动态传递给data.table。

My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically.

作为一个最小例子:

require(data.table)
type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3))
a <- (rep(1:3, 3))
b <- runif(9)
c <- runif(9)
df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F)
DT <-data.table(df)

此调用:

DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type]

会有类似的结果:

    type suma     meanb      minc
1: hello    6 0.1332210 0.4265579
2:   bye    6 0.5680839 0.2993667
3:    ok    6 0.5694532 0.2069026

未来的data.frames将有更多的列,我想要不同的总结。但是为了使用这个小例子:有没有办法通过编程列表?

Future data.frames will have more columns that I will want to summarize differently. But for the sake of working with this small example: Is there a way to pass the list programatically?

我天真地尝试:

# create a different list
mylist <- "list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))"
# new call
DT[, mylist, by=type]

1: hello
2:   bye
3:    ok
mylist
1: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
2: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
3: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))

任何提示赞赏!最好的问候!

Any hints appreciated! Best regards!

PS对不起这些 as.numeric(),我不知道为什么,我需要他们为例子运行。

PS sorry about these as.numeric(), I could not quite figure out why, but I needed them for the example to run.

轻微编辑插入列/在data.frame之前的第一句话,以澄清我的需要。

Minor edit inserted columns / before data.frame in initial sentence to clarify my needs.

推荐答案

另一种方法是使用 .SDcols 你想一起执行相同的操作。让我们假设您需要 a,d,e 的列以类型进行求和,其中as, b,g 应该有表示表示和 c,f

Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9), 
                 b = rnorm(9), c=runif(9), d=sample(9), e=sample(9), 
                 f = runif(9), g=rnorm(9))

#     type a          b         c d e         f          g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3  1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597  1.7216593
# 4:   bye 8 -0.2260640 0.3924327 8 2 0.1271187  0.4360063
# 5:   bye 7 -1.0720503 0.3256450 7 8 0.5774691  0.7571990
# 6:   bye 5 -0.7131021 0.4855804 6 9 0.2687791  1.5398858
# 7:    ok 1 -0.4680549 0.8476840 2 4 0.5633317  1.5393945
# 8:    ok 4  0.4183264 0.4402595 4 1 0.7592801  2.1829996
# 9:    ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758

# 1) set key
setkey(dt, "type")

# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))

# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]

# 4) merge them.
dt1[dt2[dt3]]

#     type  a  d  e          b          g         c         f
# 1:   bye 20 21 19 -0.6704055  0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3:    ok 14 11 10 -0.5104907  0.9090061 0.5080116 0.5633317

如果/当你有许多列,使一个列表像你可能会很繁琐。

If/when you have many many column, making a list like the one you've might be cumbersome.

这篇关于R动态地构建“列表”在data.table(或ddply)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆