如何使用dplyr对列进行范围分组？ [英] How to do range grouping on a column using dplyr?

查看：161 发布时间：2020/10/26 2:32:56 r dplyr grouping

本文介绍了如何使用dplyr对列进行范围分组？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想基于列的 range 值对data.table进行分组，如何使用dplyr库执行此操作？

例如，我的数据表如下：

  library（data.table）
库（dplyr）
 DT<-data.table（A = 1：100，B = runif（100），Amount = runif（100，0，100））
   
 
 非常感谢您。
 
 
  ---- -------------------------有关akrun答案的更多问题。 
感谢akrun的回答。我有一个有关剪切功能的新问题。如果我的DT如下所示：
  DT <-data.table（A = 1：10，B = c（0.01 ，0.04，0.06，0.09，0.1，0.13，0.14，0.15，0.17，0.71））
  
使用以下代码：
  DT％&％;％
 group_by（gr = cut（B，breaks = seq（ 0，1，by = 0.05），right = F））％>％
 summarise（n = n（））％>％
排列（as.numeric（gr））
  
我希望看到这样的结果：
  gr n 
 1 [0,0.05）2 
 2 [0.05,0.1）2 
 3 [0.1,0.15）3 
 4 [0.15 ，0.2）2 
 5 [0.7,0.75）1 
  
，但我得到的结果是像这样：
  gr n 
 1 [0,0.05）2 
 2 [0.05,0.1） 2 
 3 [0.1,0.15）4 
 4 [0.15,0.2）1 
 5 [0.7,0.75）1 
  
看起来值0.15没有正确分配。对此有任何想法吗？
解决方案
我们可以使用 cut 来完成分组。我们在 group_by 中创建 gr列，并使用 summarise 创建每个组中的元素数（ n（）），然后根据 gr对输出进行排序（排列）。
  library（dplyr）
 DT％>％
 group_by（gr = cut（B，breaks = seq（0，1， by = 0.05）））％>％
 summarise（n = n（））％&％;％
range（as.numeric（gr））
  
 
 
 
 
 
 由于初始对象是 data.table  ，这可以使用 data.table 方法（包括@Frank的建议使用 keyby 的方法）完成
  library（data.table）
 DT [，。N，keyby =。（gr = cut（B，breaks = seq（0 ，1，by = 0.05）））] 
  
编辑：
 
 
 基于OP的更新，我们可以减去 seq  
  lvls<-level（cut（DT $ B，seq（0，1，by = 0.05）））
 DT％>％
 group_by（ gr = cu t（B，breaks = seq（0，1，by = 0.05）-
 .Machine $ double.eps，right = FALSE，labels = lvls））％>％
 summarise（n = n （））％&％;％
排列（as.numeric（gr））
＃gr n 
＃1（0,0.05] 2 
＃2（0.05,0.1] 2 
＃3（0.1,0.15] 3 
＃4（0.15,0.2] 2 
＃5（0.7,0.75] 1 
  
 
I want to group a data.table based on a column's range value, how can I do this with the dplyr library?

For example, my data table is like below:
library(data.table)
library(dplyr)
DT <- data.table(A=1:100, B=runif(100), Amount=runif(100, 0, 100))
Now I want to group DT into 20 groups at 0.05 interval of column B, and count how many rows are in each group. e.g., any rows with a column B value in the range of [0, 0.05) will form a group; any rows with the column B value in the range of [0.05, 0.1) will form another group, and so on. Is there an efficient way of doing this group function?

Thank you very much.

-----------------------------More question on akrun's answer.
Thanks akrun for your answer. I got a new question about the "cut" function. If my DT is like below:
DT <- data.table(A=1:10, B=c(0.01, 0.04, 0.06, 0.09, 0.1, 0.13, 0.14, 0.15, 0.17, 0.71)) 
by using the following code: 
DT %>% 
  group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05), right=F) ) %>% 
  summarise(n= n()) %>%
  arrange(as.numeric(gr))
I expect to see results like this: 
          gr n
1   [0,0.05) 2
2 [0.05,0.1) 2
3 [0.1,0.15) 3
4 [0.15,0.2) 2
5 [0.7,0.75) 1
but the result I got is like this:  
          gr n
1   [0,0.05) 2
2 [0.05,0.1) 2
3 [0.1,0.15) 4
4 [0.15,0.2) 1
5 [0.7,0.75) 1 
Looks like the value 0.15 is not correctly allocated. Any thoughts on this?
 解决方案 
We can use cut to do the grouping.  We create the 'gr' column within the group_by, use summarise to create the number of elements in each group (n()), and order the output (arrange) based on 'gr'.
library(dplyr)
 DT %>% 
     group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05)) ) %>% 
     summarise(n= n()) %>%
     arrange(as.numeric(gr))




As the initial object is data.table, this can be done using data.table methods (included @Frank's suggestion to use keyby)
library(data.table)
DT[,.N , keyby = .(gr=cut(B, breaks=seq(0, 1, by=0.05)))]
EDIT:

Based on the update in the OP's post, we could substract a small number to the seq
lvls <- levels(cut(DT$B, seq(0, 1, by =0.05)))
DT %>%
   group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05) -
                 .Machine$double.eps, right=FALSE, labels=lvls)) %>% 
   summarise(n=n()) %>% 
   arrange(as.numeric(gr))
#          gr n
#1   (0,0.05] 2
#2 (0.05,0.1] 2
#3 (0.1,0.15] 3
#4 (0.15,0.2] 2
#5 (0.7,0.75] 1


                        
这篇关于如何使用dplyr对列进行范围分组？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用dplyr对列进行范围分组？ [英] How to do range grouping on a column using dplyr?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用dplyr对列进行范围分组？ [英] How to do range grouping on a column using dplyr?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭