如何匹配/合并R中两个不同文件中的数据? [英] How to match/merge data from two different files in R?

查看:86
本文介绍了如何匹配/合并R中两个不同文件中的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文件(file1.csv和file2.csv).如下所示,file1包含两列date和变量x1,具有365个观测值(全年).文件2包含列日期作为文件1和许多其他变量.我只对仅具有24个观测值的变量x45感兴趣(每月2个观测值).

I have two files (file1.csv and file2.csv). As shown below, file1 contains two columns date and variable x1 that has 365 observations (whole year). file 2 contains column date as file1 and many other variables. I'm interested only in variable x45 that has 24 observations only (2 observations each month).

file1

date     x1
1/01/2005   33
2/01/2005   24
3/01/2005    72
31/12/2005   52

文件2

date     x2      x3     x45
1/01/2005               115
5/02/2005                125
13/04/2005               127
31/12/2005               138

所以我想将x45列添加到file1.csv中,看起来像

so I'd like to add column x45 to file1.csv to look like

date    x1    x45
1/01/2005   33  115
2/01/2005   24    NA
3/01/2005    72   NA
31/12/2005   52           138

我尝试使用

file1= read.csv("D:/file1.csv")
file2= read.csv("D:/file2.csv")
file3 = merge(file1, file2)

但是,文件3只有24行(观测值),而忽略了文件1中的其余观测值.

However, file 3 has only 24 rows (observations) and omits the rest of observations in file 1.

对于获得上述结果的任何帮助,将不胜感激.

Any help to get the result as described above would be much appreciated.

推荐答案

您可以尝试left_join

library(dplyr)
left_join(df1, df2[c('date', 'x45')], by='date')
#         date x1 x45
#1  1/01/2005 33 115
#2  2/01/2005 24  NA
#3  3/01/2005 72  NA
#4 31/12/2005 52 138

或使用merge

merge(df1, df2[c('date', 'x45')], all.x=TRUE)
#       date x1 x45
#1  1/01/2005 33 115
#2  2/01/2005 24  NA
#3  3/01/2005 72  NA
#4 31/12/2005 52 138

更新

dplyr中的left_joinplyr中的join保持原始顺序.如果需要在merge中保持顺序,一种选择是在"df1"中创建一个"indx",在merge之后,可以使用"indx"保留原始顺序

Update

The left_join from dplyr and join from plyr keep the original order. If you need to keep order in merge, one option is to create an "indx" in "df1" and after the merge, the original order can be retained using the "indx"

df1$indx <- 1:nrow(df1)
 merge(df1, df2[c('date', 'x45')], all.x=TRUE)[order(df1$indx),-3]
    date x1 x45
 #1  1/01/2005 33 115
 #2  2/01/2005 24  NA
 #3  3/01/2005 72  NA
 #4 31/12/2005 52 138

或使用plyr

library(plyr)
join(df1, df2[c('date', 'x45')], by='date', type='left')

数据

df1 <- structure(list(date = c("1/01/2005", "2/01/2005", "3/01/2005", 
"31/12/2005"), x1 = c(33L, 24L, 72L, 52L)), .Names = c("date", 
"x1"), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(date = c("1/01/2005", "5/02/2005", "13/04/2005", 
"31/12/2005"), x2 = c(NA, NA, NA, NA), x3 = c(NA, NA, NA, NA), 
x45 = c(115L, 125L, 127L, 138L)), .Names = c("date", "x2", 
 "x3", "x45"), class = "data.frame", row.names = c(NA, -4L))

这篇关于如何匹配/合并R中两个不同文件中的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆