是否可以重命名“ by”? R中的data.table中的分组变量? [英] Is it possible to rename a "by" grouping variable in data.table in R en passant?

查看:70
本文介绍了是否可以重命名“ by”? R中的data.table中的分组变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table 中我注意到了,当使用 by 选项汇总值时,分组变量采用自然数据集中的顺序,我相信类似于SQL。因此,如果数据中2在1之前位于1之前,则输出的顺序将聚合级别2置于1之前。在大多数情况下,我不希望这样做。我注意到可以在 by 变量上调用 sort ,但是输出列标签现在为排序。可以通过以前的值(或完全不同的名称)来命名它吗?

I have noticed in data.table when aggregating values using the by option, the grouping variable takes its natural order in the dataset, akin to SQL I believe. So if 2 precedes 1 in the data, the ordering of the output has the aggregate level 2 preceding 1. In most cases, I don't want this. I noticed one can call sort on the by variable, but the output column label is now sort. Is it possible to name it by its previous value (or something completely different?) example:

mydt <- data.table(nums=1:5, lets=letters[5:1])
mydt[, .(is2=nums==2), by=sort(lets)]

给予

   sort is2
1:    a   F
2:    b   T
3:    c   F
4:    d   F
5:    e   F

但我想要:

   lets is2
1:    a   F
2:    b   T
3:    c   F
4:    d   F
5:    e   F


推荐答案

问题标题为是否可以在data中通过data.table重命名 by分组变量?,但是实际的问题是如何通过分组变量对聚合结果进行排序。因此,有两个问题。

The question is titled Is it possible to rename a "by" grouping variable in data.table in R en passant? but the actual problem is how to sort the result of an aggregation by the grouping variables. So, there are two questions in one.

是的,例如,

mydt[, .(is2 = nums == 2), by = .(lets = paste(lets, toupper(lets), sep = "-"))]




   lets   is2
1:  e-E FALSE
2:  d-D  TRUE
3:  c-C FALSE
4:  b-B FALSE
5:  a-A FALSE


为了说明起见,使用了完全不同的函数。

For illustration, a completely different function is used.

最简单的方法是使用 keyby = ,正如弗兰克

The simplest way is to use keyby = as already mentioned by Frank.

mydt[, .(is2 = nums == 2), keyby = lets]




   lets   is2
1:    a FALSE
2:    b FALSE
3:    c FALSE
4:    d  TRUE
5:    e FALSE


help( data.table)

by 相同,但在<< c>上运行 setkey()为方便起见,在
的code> by 列中。通常,当您希望对结果进行排序时,通常会定期使用'keyby ='

Same as by, but with an additional setkey() run on the by columns of the result, for convenience. It is common practice to use 'keyby=' routinely when you wish the result to be sorted.

可以随后对结果进行排序:

Alternatively, the result can be ordered afterwards:

mydt[, .(is2 = nums == 2), by = lets][order(lets)]




   lets   is2
1:    a FALSE
2:    b FALSE
3:    c FALSE
4:    d  TRUE
5:    e FALSE


这篇关于是否可以重命名“ by”? R中的data.table中的分组变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆