使用dplyr分组后查找缺少的月份 [英] Find missing month after grouping with dplyr

查看:117
本文介绍了使用dplyr分组后查找缺少的月份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据框,我使用 dplyr 分组,这是一列几个月(如数字,例如1到12),以及几列之后的统计数据(值不重要)。一个例子:

  ID_1 ID_2 month st1 st2 
1 1 1 0.5 0.2
1 1 2 0.7 0.9
1 1 3 1.1 1.7
1 1 4 2.6 0.8
1 1 5 1.8 1.3
1 1 6 2.1 2.2
1 1 7 0.5 0.2
1 1 8 0.7 0.9
1 1 9 1.1 1.7
1 1 10 2.6 0.8
1 1 11 1.8 1.3
1 1 12 2.1 2.2
1 2 1 0.5 0.2
1 2 2 0.7 0.9
1 2 3 1.1 1.7
1 2 4 2.6 0.8
1 2 5 1.8 1.3
1 2 6 2.1 2.2
1 2 7 0.5 0.2
1 2 9 1.1 1.7
1 2 10 2.6 0.8
1 2 11 1.8 1.3
1 2 12 2.1 2.2

对于第二个分组( ID_1 = 1 ID_2 = 2 ),数据中缺少一个月( month = 8 )。有没有办法我可以在这个月找到,并插入一个与正确的 ID_1 ID_2 值,其余列的缺少值和 NA 值我一直在玩这个使用 dplyr 函数,似乎不能弄清楚,也许甚至有一个非 - code> dplyr 解决方案在那里。



PS:如果有帮助,每个唯一的分组 ID_1 ID_2 将不超过1个月。

解决方案

可以通过 tidyr :: complete 完成

  library(dplyr) 
库(tidyr)

dat%>%
group_by(ID_1,ID_2)%>%
complete(month = 1:12)

数据集的尾部:

 来源:本地数据框[6 x 5] 
组:ID_1,ID_2 [1]

ID_1 ID_2 month st1 st2
< int& < INT> < INT> < DBL> < DBL>
1 1 2 7 0.5 0.2
2 1 2 8 NA NA
3 1 2 9 1.1 1.7
4 1 2 10 2.6 0.8
5 1 2 11 1.8 1.3
6 1 2 12 2.1 2.2


I have a data frame with two columns that I am grouping by with dplyr, a column of months (as numerics, e.g. 1 through 12), and several columns with statistical data following that (values unimportant). An example:

ID_1   ID_2   month  st1    st2
1      1      1      0.5    0.2
1      1      2      0.7    0.9
1      1      3      1.1    1.7
1      1      4      2.6    0.8
1      1      5      1.8    1.3
1      1      6      2.1    2.2
1      1      7      0.5    0.2
1      1      8      0.7    0.9
1      1      9      1.1    1.7
1      1      10     2.6    0.8
1      1      11     1.8    1.3
1      1      12     2.1    2.2
1      2      1      0.5    0.2
1      2      2      0.7    0.9
1      2      3      1.1    1.7
1      2      4      2.6    0.8
1      2      5      1.8    1.3
1      2      6      2.1    2.2
1      2      7      0.5    0.2
1      2      9      1.1    1.7
1      2      10     2.6    0.8
1      2      11     1.8    1.3
1      2      12     2.1    2.2

For the second grouping (ID_1 = 1 and ID_2 = 2), there is a month missing from the data (month = 8). Is there a way I can find this month and insert a row with the correct ID_1 and ID_2 values, the missing month value, and NA values for the rest of the columns? I've been playing around with this using dplyr functions and can't seem to figure it out, perhaps there is even a non-dplyr solution out there as well.

PS: If it helps, each unique grouping of ID_1 and ID_2 will have no more than 1 month missing.

解决方案

This can be done via tidyr::complete:

library(dplyr)
library(tidyr)

dat %>% 
    group_by(ID_1, ID_2) %>%
    complete(month = 1:12)

Tail of dataset:

Source: local data frame [6 x 5]
Groups: ID_1, ID_2 [1]

   ID_1  ID_2 month   st1   st2
  <int> <int> <int> <dbl> <dbl>
1     1     2     7   0.5   0.2
2     1     2     8    NA    NA
3     1     2     9   1.1   1.7
4     1     2    10   2.6   0.8
5     1     2    11   1.8   1.3
6     1     2    12   2.1   2.2

这篇关于使用dplyr分组后查找缺少的月份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆