从R中的另一个数据框中查找值 [英] Looking up values from another dataframe in r

查看:75
本文介绍了从R中的另一个数据框中查找值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 df 的大型数据框,带有一些ID.

I have large dataframe called df with some ID's.

我有另一个数据框( id_list ),其中包含一组匹配的ID及其与每个ID相关的功能.随后未在两个数据帧中对ID进行排序.

I have another dataframe (id_list) with a set of matching ID's and its associated features for each ID. The ID are not sequentally ordered in both dataframes.

有效地,我想从较大的数据框 df 查找到 id_list ,并添加两列,即 Display Type到当前数据框 df .

Effectively i would like to look up from the larger dataframe df to the id_list and add two columns namely Display and Type to the current dataframe df.

有许多令人困惑的例子.这样做可能是最有效的方法.我尝试使用 match()%in%失败了.

There are numerous confusing examples. What could be the most effective way of doing this. I tried using match() , %in% and failed miserably.

这是一个可复制的示例.

Here is a reproducible example.

df <- data.frame(Feats = matrix(rnorm(20), nrow = 20, ncol = 5), ID = sample.int(10, 10))

id_list <- data.frame(ID = sample.int(10,10),
           Display = sample(c('clear', 'blur'), 20, replace = TRUE),
           Type = sample(c('red', 'green', 'blue', 'indigo', 'yellow'), 20, replace = TRUE))

           Feats.1     Feats.2     Feats.3     Feats.4     Feats.5 ID
1   3.14944573 -0.52285062  3.14944573 -0.52285062  3.14944573  2
2  -0.41096007  0.38256691 -0.41096007  0.38256691 -0.41096007  1
3   0.03629351 -0.02514005  0.03629351 -0.02514005  0.03629351  7
4   0.91257290  1.35590761  0.91257290  1.35590761  0.91257290  5
5  -0.26927311 -2.10213773 -0.26927311 -2.10213773 -0.26927311  3
6   3.14944573 -0.52285062  3.14944573 -0.52285062  3.14944573  4
7  -0.41096007  0.38256691 -0.41096007  0.38256691 -0.41096007 10
8   0.03629351 -0.02514005  0.03629351 -0.02514005  0.03629351  6
9   0.91257290  1.35590761  0.91257290  1.35590761  0.91257290  8
10 -0.26927311 -2.10213773 -0.26927311 -2.10213773 -0.26927311  9

  ID Display   Type
1   6   clear indigo
2   1    blur   blue
3   7   clear    red
4   4   clear    red
5   3    blur    red
6  10   clear yellow
7   2   clear   blue
8   8    blur  green
9   5   clear   blue
10  9   clear  green

最终的df大小应为[20 x 8].

The resulting end df should be of size [20 x 8].

感谢您的帮助.

推荐答案

您可以使用基R中的 merge dplyr 中的 left_join 做到这一点很容易.(还有 data.table :: merge ,也许其他人可以提供答案.)您可能希望采取措施,以确保在您的表项中不会丢失任何数据.在查询中没有相应ID的数据框.如果不是这种情况,您可以在 merge 中将 all.x 更改为false或null,或者从 left_join 切换为 inner_join .为了说明这一点,我向数据中添加了一个虚拟行,其ID在查找表中不存在.

You can use merge from base R or left_join from dplyr to do this pretty easily. (There's also data.table::merge, which maybe someone else can give an answer with.) You probably want to take steps to ensure that you don't lose any data if there's an entry in your data frame that doesn't have a corresponding ID in the lookup. If that's not the case, you can change all.x to false or null in merge, or switch from left_join to inner_join. To illustrate, I added a dummy row to the data with an ID that doesn't exist in the lookup table.

df <- data.frame(Feats = matrix(rnorm(10), nrow = 5, ncol = 5), ID = sample.int(10, 10))
dummy <- df[1, ]
dummy$ID <- 12
df <- rbind(dummy, df)

id_list <- data.frame(ID = sample.int(10,10),
                      Display = sample(c('clear', 'blur'), 10, replace = TRUE),
                      Type = sample(c('red', 'green', 'blue', 'indigo', 'yellow'), 10, replace = TRUE))

使用 merge ,您可以将 by 设置为两个要连接的数据帧的列名,或者将 by.x by.y (如果它们具有不同的名称). all.x = T 会将所有观察值保留在第一个数据帧中,即使它们与第二个数据帧中的观察值不匹配.

With merge, you set either by as the column name from both data frames to join by, or by.x and by.y if they have different names. all.x = T will keep all observations in the first data frame even if they don't match an observation in the second data frame.

merged1 <- merge(df, id_list, by = "ID", sort = F, all.x = T)
merged1
#>    ID     Feats.1    Feats.2     Feats.3    Feats.4     Feats.5 Display
#> 1  10 -1.44053344  1.0086988 -1.44053344  1.0086988 -1.44053344   clear
#> 2   5  0.99220217 -0.3125813  0.99220217 -0.3125813  0.99220217   clear
#> 3   2  1.03881289  1.1277627  1.03881289  1.1277627  1.03881289   clear
#> 4   7 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186   clear
#> 5   4  0.07130125  1.1715833  0.07130125  1.1715833  0.07130125   clear
#> 6   6 -1.44053344  1.0086988 -1.44053344  1.0086988 -1.44053344   clear
#> 7   8  0.99220217 -0.3125813  0.99220217 -0.3125813  0.99220217    blur
#> 8   3  1.03881289  1.1277627  1.03881289  1.1277627  1.03881289   clear
#> 9   1 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186   clear
#> 10  9  0.07130125  1.1715833  0.07130125  1.1715833  0.07130125   clear
#> 11 12 -1.44053344  1.0086988 -1.44053344  1.0086988 -1.44053344    <NA>
#>      Type
#> 1  indigo
#> 2  yellow
#> 3    blue
#> 4  indigo
#> 5  yellow
#> 6  indigo
#> 7   green
#> 8     red
#> 9     red
#> 10   blue
#> 11   <NA>

dplyr :: left_join 保留来自第一个数据帧的所有观察结果,并合并来自第二个数据帧的所有匹配结果.

dplyr::left_join keeps all observations from the first data frame and merges in any matching ones from the second.

joined <- dplyr::left_join(df, id_list, by = "ID")
head(joined)
#>       Feats.1    Feats.2     Feats.3    Feats.4     Feats.5 ID Display
#> 1 -1.44053344  1.0086988 -1.44053344  1.0086988 -1.44053344 12    <NA>
#> 2 -1.44053344  1.0086988 -1.44053344  1.0086988 -1.44053344 10   clear
#> 3  0.99220217 -0.3125813  0.99220217 -0.3125813  0.99220217  5   clear
#> 4  1.03881289  1.1277627  1.03881289  1.1277627  1.03881289  2   clear
#> 5 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186  7   clear
#> 6  0.07130125  1.1715833  0.07130125  1.1715833  0.07130125  4   clear
#>     Type
#> 1   <NA>
#> 2 indigo
#> 3 yellow
#> 4   blue
#> 5 indigo
#> 6 yellow

reprex程序包(v0.2.0)创建于2018-07-13.

Created on 2018-07-13 by the reprex package (v0.2.0).

这篇关于从R中的另一个数据框中查找值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆