data.table - 按除一列之外的所有内容分组 [英] data.table - group by all except one column
问题描述
我可以使用 data.table
按除一列之外的所有列进行分组吗?我有很多列,所以我宁愿避免写出所有的 colnames
.
Can I group by all columns except one using data.table
? I have a lot of columns, so I'd rather avoid writing out all the colnames
.
原因是我想折叠表格中的重复项,我知道其中一列没有相关性.
The reason being I'd like to collapse duplicates in a table, where I know one column has no relevance.
library(data.table)
DT <- structure(list(N = c(1, 2, 2), val = c(50, 60, 60), collapse = c("A",
"B", "C")), .Names = c("N", "val", "collapse"), row.names = c(NA,
-3L), class = c("data.table", "data.frame"))
> DT
N val collapse
1: 1 50 A
2: 2 60 B
3: 2 60 C
也就是说,给定 DT
,是否有类似 DT[, print(.SD), by = !collapse]
的东西:
That is, given DT
, is there something like like DT[, print(.SD), by = !collapse]
which gives:
> DT[, print(.SD), .(N, val)]
collapse
1: A
collapse
1: B
2: C
实际上不必指定 .(N, val)
?我意识到我可以通过复制和粘贴列名来做到这一点,但我认为也可能有一些优雅的方式来做到这一点.
without actually having to specify .(N, val)
? I realise I can do this by copy and pasting the column names, but I thought there might be some elegant way to do this too.
推荐答案
要按除一列以外的所有列分组,可以使用:
To group by all columns except one, you can use:
by = setdiff(names(DT), "collapse")
解释:setdiff
采用setdiff(x, y)
的一般形式,返回x
不在 y
中.在这种情况下,这意味着除了 collapse
-column 之外的所有列名都被返回.
Explanation: setdiff
takes the general form of setdiff(x, y)
which returns all values of x
that are not in y
. In this case it means that all columnnames are returned except the collapse
-column.
两种选择:
# with '%in%'
names(dt1)[!names(dt1) %in% 'colB']
# with 'is.element'
names(dt1)[!is.element(names(dt1), 'colB')]
这篇关于data.table - 按除一列之外的所有内容分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!