R 动态构建“列表"在 data.table(或 ddply)中 [英] R Dynamically build "list" in data.table (or ddply)

查看:14
本文介绍了R 动态构建“列表"在 data.table(或 ddply)中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的聚合需求因列/data.frames 而异.我想动态地将list"参数传递给data.table.

My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically.

作为一个最小的例子:

require(data.table)
type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3))
a <- (rep(1:3, 3))
b <- runif(9)
c <- runif(9)
df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F)
DT <-data.table(df)

这个电话:

DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type]

会有类似这样的结果:

    type suma     meanb      minc
1: hello    6 0.1332210 0.4265579
2:   bye    6 0.5680839 0.2993667
3:    ok    6 0.5694532 0.2069026

未来的 data.frames 将有更多列,我想以不同的方式总结.但是为了使用这个小例子:有没有办法以编程方式传递列表?

Future data.frames will have more columns that I will want to summarize differently. But for the sake of working with this small example: Is there a way to pass the list programatically?

我天真地尝试过:

# create a different list
mylist <- "list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))"
# new call
DT[, mylist, by=type]

出现以下错误:

1: hello
2:   bye
3:    ok
mylist
1: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
2: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
3: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))

任何提示表示赞赏!最好的问候!

Any hints appreciated! Best regards!

PS 对这些 as.numeric() 感到抱歉,我不太明白为什么,但我需要它们来运行示例.

PS sorry about these as.numeric(), I could not quite figure out why, but I needed them for the example to run.

次要编辑在初始句子中的 data.frame 之前插入列/以阐明我的需求.

Minor edit inserted columns / before data.frame in initial sentence to clarify my needs.

推荐答案

另一种方法是使用 .SDcols 将要执行相同操作的列分组在一起.假设您需要将列 a,d,etype 相加,其中 b,g 应该具有 mean 取和 c,f 它的中位数,然后,

Another way is to use .SDcols to group the columns for which you'd like to perform the same operations together. Let's say that you require columns a,d,e to be summed by type where as, b,g should have mean taken and c,f its median, then,

# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9), 
                 b = rnorm(9), c=runif(9), d=sample(9), e=sample(9), 
                 f = runif(9), g=rnorm(9))

#     type a          b         c d e         f          g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3  1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597  1.7216593
# 4:   bye 8 -0.2260640 0.3924327 8 2 0.1271187  0.4360063
# 5:   bye 7 -1.0720503 0.3256450 7 8 0.5774691  0.7571990
# 6:   bye 5 -0.7131021 0.4855804 6 9 0.2687791  1.5398858
# 7:    ok 1 -0.4680549 0.8476840 2 4 0.5633317  1.5393945
# 8:    ok 4  0.4183264 0.4402595 4 1 0.7592801  2.1829996
# 9:    ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758

# 1) set key
setkey(dt, "type")

# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))

# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]

# 4) merge them.
dt1[dt2[dt3]]

#     type  a  d  e          b          g         c         f
# 1:   bye 20 21 19 -0.6704055  0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3:    ok 14 11 10 -0.5104907  0.9090061 0.5080116 0.5633317

如果/当您有很多列时,制作一个像您一样的列表可能会很麻烦.

If/when you have many many column, making a list like the one you've might be cumbersome.

这篇关于R 动态构建“列表"在 data.table(或 ddply)中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆