数据表的j表达式中的列（带/不带by语句） [英] Column in the j-expression of a data.table (with/without a by statement)

查看：96 发布时间：2017/3/12 12:03:51 r data.table plyr

本文介绍了数据表的j表达式中的列（带/不带by语句）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这里有两个人工的，但我希望教学的例子我的问题。

Here are two artificial but I hope pedagogical examples of my problem.

1）运行此代码时：

> dat0 <- data.frame(A=c("a","a","b"), B="")
> data.table(dat0)[, lapply(.SD, function(x) length(A)) , by = "A"]
   A B
1: a 1
2: b 1

我预期的输出

   A B
1: a 2
2: b 1

（类似于 plyr :: ddply（dat0，。（A），nrow））。

让我给一个不太人为的例子。考虑以下数据帧：

Let me give a less artificial example. Consider the following dataframe:

dat0 <- data.frame(A=c("a","a","b"), x=c(1,2,3), y=c(9,8,7))
> dat0
  A x y
1 a 1 9
2 a 2 8
3 b 3 7

使用 plyr 包，我得到 x 和 y 每个 A 的值如下：

Using plyr package, I get the means of x and y by each value of A as follows:

> ddply(dat0, .(A), summarise, x=mean(x), y=mean(y))
  A   x   y
1 a 1.5 8.5
2 b 3.0 7.0

很不错。现在想象另一个变量 H 并进行以下计算：

Very nice. Now imagine another variable H and the following calculations:

dat0 <- data.frame(A=c("a","a","b"), H=c(0,1,-1), x=c(1,2,3), y=c(9,8,7))
> ddply(dat0, .(A), summarise, x=mean(x)^mean(H), y=mean(y)^mean(H))
  A         x         y
1 a 1.2247449 2.9154759
2 b 0.3333333 0.1428571

但现在，假设有大量的变量 x 要计算 mean（x）^ mean（H） 。然后我不想输入：

Very nice too. But now, imagine there's a huge number of variables x for which you want to calculate mean(x)^mean(H). Then I don't want to type:

ddply(dat0, .(A), summarise, a=mean(a)^mean(H), b=mean(b)^mean(H), c=mean(c)^mean(H), d=mean(d)^mean(H), ...........)

所以我的想法是尝试：

flipcols <- my_selected_columns # c("a", "b", "c", "d", ....)
data.table(dat0)[, lapply(.SD, function(x) mean(x)^mean(H)), by = "A", .SDcols = flipcols]

但这不起作用，因为 function（x）中存在 H 平均值（x）^平均值（H）不按预期处理！我还不能使它与 plyr :: colwise 一起工作。

But that doesn't work because the presence of H in function(x) mean(x)^mean(H) is not handled as I expected! I have not been able to make it work with plyr::colwise too.

2）运行此代码时：

> dat0 <- data.frame(A=c("a","a","b"), B=1:3, c=0)
> data.table(dat0)[, lapply(.SD, function(x) B), .SDcols="c"]
Error in ..FUN(c) : object 'B' not found

我预期它会正常工作并生成：

I expected it works and generates :

那么有没有办法在转换中使用原始data.table的列？

So is there a way to use the columns of the original data.table in a transformation ?

推荐答案

1）使用 .N 。分组变量 A 的长度为1，因为每个组只有一个值 A 通过定义什么分组意味着）：

1) Use .N. The length of the grouping variable A there is 1 because there is just one value of A for each group (this is by definition of what grouping means):

dt <- data.table(A=c("a","a","b"), B="")
dt[, .N, by = A]
#   A N
#1: a 2
#2: b 1

（已更新1）这是和2）一样的问题。解决方法是不使用 .SDcols ：

(updated 1) This is the same issue as 2). A workaround is to not use .SDcols:

dt = data.table(A=c("a","a","b"), H=c(0,1,-1), x=c(1,2,3), y=c(9,8,7))
dt[, lapply(.SD[, !"H", with = F], function(x) mean(x) ^ mean(H)), by = A]
#   A         x         y
#1: a 1.2247449 2.9154759
#2: b 0.3333333 0.1428571

2）这是之前报告的错误： https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5222&group_id=240&atid = 975

2) This is a bug that's been reported before here: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5222&group_id=240&atid=975

这篇关于数据表的j表达式中的列（带/不带by语句）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据表的j表达式中的列（带/不带by语句） [英] Column in the j-expression of a data.table (with/without a by statement)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据表的j表达式中的列（带/不带by语句） [英] Column in the j-expression of a data.table (with/without a by statement)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭