将数据帧的互补行与 R 合并 [英] Merging complementary rows of a dataframe with R

查看:47
本文介绍了将数据帧的互补行与 R 合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样一个数据框

0     weekday day month year hour basal bolus carb period.h
1    Tuesday  01    03 2016  0.0  0.25    NA   NA        0
2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10
4    Tuesday  01    03 2016 12.0  0.30    NA   NA       12
5    Tuesday  01    03 2016 17.0  0.50    NA   NA       17
6    Tuesday  01    03 2016 17.6    NA    NA   33       17
7    Tuesday  01    03 2016 17.6    NA  1.35   NA       17
8    Tuesday  01    03 2016 18.6    NA    NA   44       18
9    Tuesday  01    03 2016 18.6    NA  1.80   NA       18
10   Tuesday  01    03 2016 18.9    NA    NA   17       18
11   Tuesday  01    03 2016 18.9    NA  0.70   NA       18
12   Tuesday  01    03 2016 22.0  0.40    NA   NA       22
13 Wednesday  02    03 2016  0.0  0.25    NA   NA        0
14 Wednesday  02    03 2016  9.7    NA    NA   39        9
15 Wednesday  02    03 2016  9.7    NA  2.65   NA        9
16 Wednesday  02    03 2016 11.2    NA    NA   13       11
17 Wednesday  02    03 2016 11.2    NA  0.30   NA       11
18 Wednesday  02    03 2016 12.0  0.30    NA   NA       12
19 Wednesday  02    03 2016 12.0    NA    NA   16       12
20 Wednesday  02    03 2016 12.0    NA  0.65   NA       12

如果您查看第 2 行和第 3 行,您会注意到它们完全对应于同一天 &时间:对于第 2 行,carb"不是 NA,bolus"不是 NA(这些是关于糖尿病的数据).

If you look at the lines 2 and 3, you notice that they correspond exactly to the same day & time: just for the line #2 the "carb" is not NA, and the "bolus" is not NA (These are data about diabete).

我想将这些行合并为一条:

I want to merge such lines into a single one:

2    Tuesday  01    03 2016 10.9    NA    NA   67       10
3    Tuesday  01    03 2016 10.9    NA  4.15   NA       10

->

2    Tuesday  01    03 2016 10.9    NA    4.15   67       10

我当然可以在每一行上做一个残酷的双循环,但我正在寻找一种更聪明、更快的方法.

I could of course do a brutal double loop over each line, but I look for a cleverer and faster way.

推荐答案

您可以在此处按公共标识符列 weekday、day、month、year、hour、period.h 对数据框进行分组然后排序并从要合并的剩余列中取出第一个元素,默认情况下 sort() 函数将删除要排序的向量中的 NAs,因此您最终将得到每组中每一列的非 NA 元素;如果列中的所有元素都是 NA,则 sort(col)[1] 返回 NA:

You can group your data frame by the common identifier columns weekday, day, month, year, hour, period.h here and then sort and take the first element from the remaining columns which you would like to merge, sort() function by default will remove NAs in the vector to be sorted and thus you will end up with non-NA elements for each column within each group; if all elements in a column are NA, sort(col)[1] returns NA:

library(dplyr)
df %>% 
       group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(sort(.)[1]))

#      weekday   day month  year  hour period.h basal bolus  carb
#       <fctr> <int> <int> <int> <dbl>    <int> <dbl> <dbl> <int>
# 1    Tuesday     1     3  2016   0.0        0  0.25    NA    NA
# 2    Tuesday     1     3  2016  10.9       10    NA  4.15    67
# 3    Tuesday     1     3  2016  12.0       12  0.30    NA    NA
# 4    Tuesday     1     3  2016  17.0       17  0.50    NA    NA
# 5    Tuesday     1     3  2016  17.6       17    NA  1.35    33
# 6    Tuesday     1     3  2016  18.6       18    NA  1.80    44
# 7    Tuesday     1     3  2016  18.9       18    NA  0.70    17
# 8    Tuesday     1     3  2016  22.0       22  0.40    NA    NA
# 9  Wednesday     2     3  2016   0.0        0  0.25    NA    NA
# 10 Wednesday     2     3  2016   9.7        9    NA  2.65    39
# 11 Wednesday     2     3  2016  11.2       11    NA  0.30    13
# 12 Wednesday     2     3  2016  12.0       12  0.30  0.65    16

代替sort(),也许这里使用更合适的函数是na.omit():

Instead of sort(), maybe a more appropriate function to use here is na.omit():

df %>% group_by(weekday, day, month, year, hour, period.h) %>% 
       summarise_all(funs(na.omit(.)[1]))

这篇关于将数据帧的互补行与 R 合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆