data.table - 按除一列之外的所有内容分组 [英] data.table - group by all except one column

查看:21
本文介绍了data.table - 按除一列之外的所有内容分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用 data.table 按除一列之外的所有列进行分组吗?我有很多列,所以我宁愿避免写出所有的 colnames.

Can I group by all columns except one using data.table? I have a lot of columns, so I'd rather avoid writing out all the colnames.

原因是我想折叠表格中的重复项,我知道其中一列没有相关性.

The reason being I'd like to collapse duplicates in a table, where I know one column has no relevance.

library(data.table)

DT <- structure(list(N = c(1, 2, 2), val = c(50, 60, 60), collapse = c("A", 
"B", "C")), .Names = c("N", "val", "collapse"), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"))

> DT
   N val collapse
1: 1  50        A
2: 2  60        B
3: 2  60        C

也就是说,给定 DT,是否有类似 DT[, print(.SD), by = !collapse] 的东西:

That is, given DT, is there something like like DT[, print(.SD), by = !collapse] which gives:

> DT[, print(.SD), .(N, val)]
   collapse
1:        A
   collapse
1:        B
2:        C

实际上不必指定 .(N, val)?我意识到我可以通过复制和粘贴列名来做到这一点,但我认为也可能有一些优雅的方式来做到这一点.

without actually having to specify .(N, val)? I realise I can do this by copy and pasting the column names, but I thought there might be some elegant way to do this too.

推荐答案

要按除一列以外的所有列分组,可以使用:

To group by all columns except one, you can use:

by = setdiff(names(DT), "collapse")

解释:setdiff采用setdiff(x, y)的一般形式,返回x 不在 y 中.在这种情况下,这意味着除了 collapse-column 之外的所有列名都被返回.

Explanation: setdiff takes the general form of setdiff(x, y) which returns all values of x that are not in y. In this case it means that all columnnames are returned except the collapse-column.

两种选择:

# with '%in%'
names(dt1)[!names(dt1) %in% 'colB']

# with 'is.element'
names(dt1)[!is.element(names(dt1), 'colB')]

这篇关于data.table - 按除一列之外的所有内容分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆