如何通过group_by的group-number对数据表进行数字/标签化? [英] How to number/label data-table by group-number from group_by?
问题描述
我有一个tbl_df,我想对(u,v)观察到的每个不同的整数组合对group_by(u,v)进行分组。
I have a tbl_df where I want to group_by(u,v) for each distinct integer combination observed with (u,v).
编辑:这是通过在 group_indices() / hadley / dplyr / releasesrel =nofollow noreferrer> dplyr 0.4.0
this was resolved by adding group_indices()
back in dplyr 0.4.0
a)然后我想分配每个不同的组一些任意不同的数字label = 1,2,3 ...
eg组合(u,v)==(2,3)可以得到标签1,(1,3)可以得到2,依此类推。
如何使用一个 mutate()
,没有三步骤总结和自我加入?
a) I then want to assign each distinct group some arbitrary distinct number label=1,2,3...
e.g. the combination (u,v)==(2,3) could get label 1, (1,3) could get 2, and so on.
How to do this with one mutate()
, without a three-step summarize-and-self-join?
dplyr有一个整齐的函数 n()
,但它给出了其组内的元素数量,而不是整体组号。 在 data.table
这将简单地称为 .GRP
。
dplyr has a neat function n()
, but that gives the number of elements within its group, not the overall number of the group. In data.table
this would simply be called .GRP
.
b)其实我是什么真的要分配一个字符串/字符标签('A','B',...)。
但是通过整数编号是足够好的,因为我可以使用 integer_to_label(i)
如下。除非有一个聪明的方法来合并这两个?但不要出汗这部分。
b) Actually what I really want to assign a string/character label ('A','B',...).
But numbering groups by integers is good-enough, because I can then use integer_to_label(i)
as below. Unless there's a clever way to merge these two? But don't sweat this part.
set.seed(1234)
# Helper fn for mapping integer 1..26 to character label
integer_to_label <- function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) }
df <- tbl_df(data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T)))
# Want to label/number each distinct group of unique (u,v) combinations
df %>% group_by(u,v) %>% mutate(label = n()) # WRONG: n() is number of element within its group, not overall number of group
u v
1 2 3
2 1 3
3 1 2
4 2 3
5 1 2
6 3 3
7 1 3
8 1 2
9 3 1
10 3 4
KLUDGE 1: could do df %>% group_by(u,v) %>% summarize(label = n()) , then self-join
推荐答案
更新答案
get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())
您还可以考虑以下稍微不可读的版本
You can also consider the following slightly unreadable version
group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())
使用 iterators
package
library(iterators)
counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))
这篇关于如何通过group_by的group-number对数据表进行数字/标签化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!