如何通过group_by中的group-number对数据表进行编号/标记? [英] How to number/label data-table by group-number from group_by?

查看:25
本文介绍了如何通过group_by中的group-number对数据表进行编号/标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 tbl_df,我想对 (u, v) 观察到的每个不同的整数组合进行 group_by(u, v).

I have a tbl_df where I want to group_by(u, v) for each distinct integer combination observed with (u, v).

随后通过在 dplyr 0.4.0

this was subsequently resolved by adding the (now-deprecated) group_indices() back in dplyr 0.4.0

a) 然后我想为每个不同的组分配一些任意不同的数字 label=1,2,3...例如组合 (u,v)==(2,3) 可以得到标签 1,(1,3) 可以得到 2,依此类推.如何用一个 mutate() 做到这一点,而不需要三步汇总和自联接?

a) I then want to assign each distinct group some arbitrary distinct number label=1,2,3... e.g. the combination (u,v)==(2,3) could get label 1, (1,3) could get 2, and so on. How to do this with one mutate(), without a three-step summarize-and-self-join?

dplyr 有一个简洁的函数 n(),但它给出了它的组元素的数量,而不是整个组的数量.data.table中将简单地称为.GRP.

dplyr has a neat function n(), but that gives the number of elements within its group, not the overall number of the group. In data.table this would simply be called .GRP.

b) 实际上我真正想分配一个字符串/字符标签('A','B',...).但是按整数对组进行编号就足够了,因为我可以使用 integer_to_label(i) 如下.除非有一种巧妙的方法来合并这两者?但不要担心这部分.

b) Actually what I really want to assign a string/character label ('A','B',...). But numbering groups by integers is good-enough, because I can then use integer_to_label(i) as below. Unless there's a clever way to merge these two? But don't sweat this part.

set.seed(1234)

# Helper fn for mapping integer 1..26 to character label
integer_to_label <- function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) }

df <- tibble::as_tibble(data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T)))

# Want to label/number each distinct group of unique (u,v) combinations
df %>% group_by(u,v) %>% mutate(label = n()) # WRONG: n() is number of element within its group, not overall number of group

   u v
1  2 3
2  1 3
3  1 2
4  2 3
5  1 2
6  3 3
7  1 3
8  1 2
9  3 1
10 3 4

KLUDGE 1: could do df %>% group_by(u,v) %>% summarize(label = n()) , then self-join

推荐答案

更新答案

get_group_number = function(){
    i = 0
    function(){
        i <<- i+1
        i
    }
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())

你也可以考虑以下稍微不可读的版本

You can also consider the following slightly unreadable version

group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())

<小时>

使用迭代器

library(iterators)

counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))

这篇关于如何通过group_by中的group-number对数据表进行编号/标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆