如何在R中将两个data.frames合并在一起,引用查找表 [英] How to merge two data.frames together in R, referencing a lookup table

查看:1395
本文介绍了如何在R中将两个data.frames合并在一起,引用查找表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将两个 data.frames 合并在一起,基于每个名为 series_id 。这是我的合并语句:

I am trying to merge two data.frames together, based on a common column name in each of them called series_id. Here is my merge statement:

merge(test_growth_series_LUT,  test_growth_series, by = intersect(series_id, series_id))

我收到的错误是


as.vector(y)中的错误:没有找到对象'series_id'

Error in as.vector(y) : object 'series_id' not found

帮助给出了这个描述,但是我可以不明白为什么它找不到 series_id 。示例数据如下。

The help gives this description, but I can't see why it can't find the series_id. Example data is below.

### S3 method for class 'data.frame':
   #merge(x, y, by = intersect(names(x), names(y)),
   #      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
   #      sort = TRUE, suffixes = c(".x",".y"), ...)



# Create a long data.frame to store data...
test_growth_series = data.frame ("read_day" = c(0, 3, 9, 0, 3, 9, 0, 2, 8), 
"series_id" = c("p1s1", "p1s1", "p1s1", "p1s2", "p1s2", "p1s2", "p3s4", "p3s4", "p3s4"),
"mean_od" = c(0.6, 0.9, 1.3, 0.3, 0.6, 1.0, 0.2, 0.5, 1.2),
"sd_od" = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.3, 0.04, 0.1, 0.3),
"n_in_stat" = c(8, 8, 8, 8, 7, 5, 8, 7, 2)
)

# Create a name LUT
test_growth_series_LUT = data.frame ("series_id" = c("p1s1", "p1s2", "p3s4", "p4s2", "p5s2", "p6s2", "p7s4", "p8s4", "p9s4"),"description" = c("blah1", "blah2", "blah3", "blah4", "blah5", "blah6", "blah7", "blah8", "blah9")
)

> test_growth_series
  read_day series_id mean_od sd_od n_in_stat
1        0      p1s1     0.6  0.10         8
2        3      p1s1     0.9  0.20         8
3        9      p1s1     1.3  0.20         8
4        0      p1s2     0.3  0.10         8
5        3      p1s2     0.6  0.10         7
6        9      p1s2     1.0  0.30         5
7        0      p3s4     0.2  0.04         8
8        2      p3s4     0.5  0.10         7
9        8      p3s4     1.2  0.30         2
> test_growth_series_LUT
  series_id description
1      p1s1       blah1
2      p1s2       blah2
3      p3s4       blah3
4      p4s2       blah4
5      p5s2       blah5
6      p6s2       blah6
7      p7s4       blah7
8      p8s4       blah8
9      p9s4       blah9
> 



this is what I'm trying to achieve:  
> new_test_growth_series
  read_day series_id mean_od sd_od n_in_stat        description
1        0      p1s1     0.6  0.10         8        blah1
2        3      p1s1     0.9  0.20         8        blah1
3        9      p1s1     1.3  0.20         8        blah1
4        0      p1s2     0.3  0.10         8        blah2
5        3      p1s2     0.6  0.10         7        blah2
6        9      p1s2     1.0  0.30         5        blah2
7        0      p3s4     0.2  0.04         8        blah3
8        2      p3s4     0.5  0.10         7        blah3
9        8      p3s4     1.2  0.30         2        blah3


推荐答案

你可以这样做:

merge(test_growth_series_LUT, test_growth_series)

它将自动匹配名称。如果需要指定列,可以这样做:

It will automatically match the names. If you need to specify the column, you do it like this:

merge(test_growth_series_LUT, test_growth_series, by = "series_id")

或者这样,如果您需要双方指定(只需要有不同的名称,你要匹配):

Or this way if you need to specify on both sides (only needed if they have different names that you want to match on):

merge(test_growth_series_LUT, test_growth_series, by.x = "series_id", by.y = "series_id")

我建议您查看示例(并走过他们),转到帮助进行合并(?merge )或通过调用示例(merge,base)(实际走过

I recommend looking at the examples (and walking through them) by going to the help for merge (?merge) or by calling example("merge", "base") (less useful that actually walking through it yourself.

两个注释:


  1. 您永远不需要使用相交函数在这里使用 c()可以明确指定多个列名,或使用全部 all.x all.y 参数,以指定您想要什么样的加入。


  1. You would never need to use the intersect function here. Use c() to specify multiple column names explicitly. Or use the all, all.x, and all.y parameters to specify what kind of join you want.
  2. You would use quotes to specify a column name in most cases unless you have attached the data. Otherwise it will complain about not being able to locate the name. In particular, the name needs to be in the search path when you aren't using quotes.

这篇关于如何在R中将两个data.frames合并在一起,引用查找表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆