R：用另一个数据帧更新数据帧 [英] R: Updating a data frame with another data frame

查看：119 发布时间：2017/3/26 2:10:55 r join replace dataframe dplyr

本文介绍了R：用另一个数据帧更新数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们的初始数据框如下所示：

  df1 = data.frame（Index = c（1：6 ），A = c（1：6），B = c（1,2,3，NA，NA，NA），C = c（1,2,3，NA，NA，NA））
 
> df1 
指数ABC 
 1 1 1 1 1 
 2 2 2 2 2 
 3 3 3 3 3 
 4 4 4 NA NA 
 5 5 5 NA NA 
 6 6 6 NA NA

另一个数据框包含col B和C

  df2 = data.frame（Index = c（4,5,6），B = c（4,4， 4），C = c（5,5,5））
 
> df2 
索引BC 
 1 4 4 5 
 2 5 4 5 
 3 6 4 5

如何更新df1中缺少的值，如下所示：

我的尝试：

  library（dplyr）
 
> full_join（df1，df2）
加入者：c（索引，B，C）
索引ABC 
 1 1 1 1 1 
 2 2 2 2 2 
 3 3 3 3 3 
 4 4 4 NA NA 
 5 5 5 NA NA 
 6 6 6 NA NA 
 7 4 NA 4 5 
 8 5 NA 4 5 
 9 6 NA 4 5

您可以看到已经创建对于4,5,6指数而言，重复行，而不是替换NA值。

任何帮助将不胜感激！

解决方案

合并然后聚合：

  aggregate（。〜Index，data = merge（df1，df2，all = TRUE），na.omit，na.action = na.pass）
 
＃索引BCA 
＃1 1 1 1 1 
＃2 2 2 2 2 
＃3 3 3 3 3 
＃4 4 4 5 4 
＃5 5 4 5 5 
＃6 6 4 5 6

或在 dplyr 中说：

  df1％>％
 full_join（df2）％>％
 group_by（Index）％>％
 （$）
 
 #Joining by：c（Index，B，C）
 #Source：本地数据帧[6 x 4] 
＃
＃索引ABC 
＃（dbl）（int）（dbl）（dbl）
＃1 1 1 1 1 
＃2 2 2 2 2 
＃3 3 3 3 3 
＃4 4 4 4 5 
＃5 5 5 4 5 
＃6 6 6 4 5

Let's say our initial data frame looks like this:

df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,NA,NA,NA),C=c(1,2,3,NA,NA,NA))

> df1
  Index A  B  C
1     1 1  1  1
2     2 2  2  2
3     3 3  3  3
4     4 4 NA NA
5     5 5 NA NA
6     6 6 NA NA

Another data frame contains new information for col B and C

df2 = data.frame(Index=c(4,5,6),B=c(4,4,4),C=c(5,5,5))

> df2
  Index B C
1     4 4 5
2     5 4 5
3     6 4 5

How can you update the missing values in df1 so it looks like this:

  Index A B C
1     1 1 1 1
2     2 2 2 2
3     3 3 3 3
4     4 4 4 5
5     5 5 4 5
6     6 6 4 5

My attempt:

library(dplyr)

> full_join(df1,df2)
Joining by: c("Index", "B", "C")
  Index  A  B  C
1     1  1  1  1
2     2  2  2  2
3     3  3  3  3
4     4  4 NA NA
5     5  5 NA NA
6     6  6 NA NA
7     4 NA  4  5
8     5 NA  4  5
9     6 NA  4  5

Which as you can see has created duplicate rows for the 4,5,6 index instead of replacing the NA values.

Any help would be greatly appreciated!

解决方案

merge then aggregate:

aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )

#  Index B C A
#1     1 1 1 1
#2     2 2 2 2
#3     3 3 3 3
#4     4 4 5 4
#5     5 4 5 5
#6     6 4 5 6

Or in dplyr speak:

df1 %>% 
    full_join(df2) %>%
    group_by(Index) %>%
    summarise_each(funs(na.omit))

#Joining by: c("Index", "B", "C")
#Source: local data frame [6 x 4]
#
#  Index     A     B     C
#  (dbl) (int) (dbl) (dbl)
#1     1     1     1     1
#2     2     2     2     2
#3     3     3     3     3
#4     4     4     4     5
#5     5     5     4     5
#6     6     6     4     5

这篇关于R：用另一个数据帧更新数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：用另一个数据帧更新数据帧 [英] R: Updating a data frame with another data frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：用另一个数据帧更新数据帧 [英] R: Updating a data frame with another data frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭