使用dplyr分组后查找缺少的月份 [英] Find missing month after grouping with dplyr
问题描述
dplyr
分组,这是一列几个月(如数字,例如1到12),以及几列之后的统计数据(值不重要)。一个例子: ID_1 ID_2 month st1 st2
1 1 1 0.5 0.2
1 1 2 0.7 0.9
1 1 3 1.1 1.7
1 1 4 2.6 0.8
1 1 5 1.8 1.3
1 1 6 2.1 2.2
1 1 7 0.5 0.2
1 1 8 0.7 0.9
1 1 9 1.1 1.7
1 1 10 2.6 0.8
1 1 11 1.8 1.3
1 1 12 2.1 2.2
1 2 1 0.5 0.2
1 2 2 0.7 0.9
1 2 3 1.1 1.7
1 2 4 2.6 0.8
1 2 5 1.8 1.3
1 2 6 2.1 2.2
1 2 7 0.5 0.2
1 2 9 1.1 1.7
1 2 10 2.6 0.8
1 2 11 1.8 1.3
1 2 12 2.1 2.2
对于第二个分组( ID_1 = 1
和 ID_2 = 2
),数据中缺少一个月( month = 8
)。有没有办法我可以在这个月找到,并插入一个行与正确的 ID_1
和 ID_2
值,其余列的缺少月
值和 NA
值我一直在玩这个使用 dplyr
函数,似乎不能弄清楚,也许甚至有一个非 - code> dplyr 解决方案在那里。
PS:如果有帮助,每个唯一的分组 ID_1
和 ID_2
将不超过1个月。
可以通过 tidyr :: complete
完成
library(dplyr)
库(tidyr)
dat%>%
group_by(ID_1,ID_2)%>%
complete(month = 1:12)
数据集的尾部:
来源:本地数据框[6 x 5]
组:ID_1,ID_2 [1]
ID_1 ID_2 month st1 st2
< int& < INT> < INT> < DBL> < DBL>
1 1 2 7 0.5 0.2
2 1 2 8 NA NA
3 1 2 9 1.1 1.7
4 1 2 10 2.6 0.8
5 1 2 11 1.8 1.3
6 1 2 12 2.1 2.2
I have a data frame with two columns that I am grouping by with dplyr
, a column of months (as numerics, e.g. 1 through 12), and several columns with statistical data following that (values unimportant). An example:
ID_1 ID_2 month st1 st2
1 1 1 0.5 0.2
1 1 2 0.7 0.9
1 1 3 1.1 1.7
1 1 4 2.6 0.8
1 1 5 1.8 1.3
1 1 6 2.1 2.2
1 1 7 0.5 0.2
1 1 8 0.7 0.9
1 1 9 1.1 1.7
1 1 10 2.6 0.8
1 1 11 1.8 1.3
1 1 12 2.1 2.2
1 2 1 0.5 0.2
1 2 2 0.7 0.9
1 2 3 1.1 1.7
1 2 4 2.6 0.8
1 2 5 1.8 1.3
1 2 6 2.1 2.2
1 2 7 0.5 0.2
1 2 9 1.1 1.7
1 2 10 2.6 0.8
1 2 11 1.8 1.3
1 2 12 2.1 2.2
For the second grouping (ID_1 = 1
and ID_2 = 2
), there is a month missing from the data (month = 8
). Is there a way I can find this month and insert a row with the correct ID_1
and ID_2
values, the missing month
value, and NA
values for the rest of the columns? I've been playing around with this using dplyr
functions and can't seem to figure it out, perhaps there is even a non-dplyr
solution out there as well.
PS: If it helps, each unique grouping of ID_1
and ID_2
will have no more than 1 month missing.
This can be done via tidyr::complete
:
library(dplyr)
library(tidyr)
dat %>%
group_by(ID_1, ID_2) %>%
complete(month = 1:12)
Tail of dataset:
Source: local data frame [6 x 5]
Groups: ID_1, ID_2 [1]
ID_1 ID_2 month st1 st2
<int> <int> <int> <dbl> <dbl>
1 1 2 7 0.5 0.2
2 1 2 8 NA NA
3 1 2 9 1.1 1.7
4 1 2 10 2.6 0.8
5 1 2 11 1.8 1.3
6 1 2 12 2.1 2.2
这篇关于使用dplyr分组后查找缺少的月份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!