从R中的另一个数据框中查找值 [英] Looking up values from another dataframe in r
问题描述
我有一个名为 df
的大型数据框,带有一些ID.
I have large dataframe called df
with some ID's.
我有另一个数据框( id_list
),其中包含一组匹配的ID及其与每个ID相关的功能.随后未在两个数据帧中对ID进行排序.
I have another dataframe (id_list
) with a set of matching ID's and its associated features for each ID. The ID are not sequentally ordered in both dataframes.
有效地,我想从较大的数据框 df
查找到 id_list
,并添加两列,即 Display
和 Type
到当前数据框 df
.
Effectively i would like to look up from the larger dataframe df
to the id_list
and add two columns namely Display
and Type
to the current dataframe df
.
有许多令人困惑的例子.这样做可能是最有效的方法.我尝试使用 match()
,%in%
失败了.
There are numerous confusing examples. What could be the most effective way of doing this. I tried using match()
, %in%
and failed miserably.
这是一个可复制的示例.
Here is a reproducible example.
df <- data.frame(Feats = matrix(rnorm(20), nrow = 20, ncol = 5), ID = sample.int(10, 10))
id_list <- data.frame(ID = sample.int(10,10),
Display = sample(c('clear', 'blur'), 20, replace = TRUE),
Type = sample(c('red', 'green', 'blue', 'indigo', 'yellow'), 20, replace = TRUE))
Feats.1 Feats.2 Feats.3 Feats.4 Feats.5 ID
1 3.14944573 -0.52285062 3.14944573 -0.52285062 3.14944573 2
2 -0.41096007 0.38256691 -0.41096007 0.38256691 -0.41096007 1
3 0.03629351 -0.02514005 0.03629351 -0.02514005 0.03629351 7
4 0.91257290 1.35590761 0.91257290 1.35590761 0.91257290 5
5 -0.26927311 -2.10213773 -0.26927311 -2.10213773 -0.26927311 3
6 3.14944573 -0.52285062 3.14944573 -0.52285062 3.14944573 4
7 -0.41096007 0.38256691 -0.41096007 0.38256691 -0.41096007 10
8 0.03629351 -0.02514005 0.03629351 -0.02514005 0.03629351 6
9 0.91257290 1.35590761 0.91257290 1.35590761 0.91257290 8
10 -0.26927311 -2.10213773 -0.26927311 -2.10213773 -0.26927311 9
ID Display Type
1 6 clear indigo
2 1 blur blue
3 7 clear red
4 4 clear red
5 3 blur red
6 10 clear yellow
7 2 clear blue
8 8 blur green
9 5 clear blue
10 9 clear green
最终的df大小应为[20 x 8].
The resulting end df should be of size [20 x 8].
感谢您的帮助.
推荐答案
您可以使用基R中的 merge
或 dplyr
中的 left_join
做到这一点很容易.(还有 data.table :: merge
,也许其他人可以提供答案.)您可能希望采取措施,以确保在您的表项中不会丢失任何数据.在查询中没有相应ID的数据框.如果不是这种情况,您可以在 merge
中将 all.x
更改为false或null,或者从 left_join
切换为 inner_join
.为了说明这一点,我向数据中添加了一个虚拟行,其ID在查找表中不存在.
You can use merge
from base R or left_join
from dplyr
to do this pretty easily. (There's also data.table::merge
, which maybe someone else can give an answer with.) You probably want to take steps to ensure that you don't lose any data if there's an entry in your data frame that doesn't have a corresponding ID in the lookup. If that's not the case, you can change all.x
to false or null in merge
, or switch from left_join
to inner_join
. To illustrate, I added a dummy row to the data with an ID that doesn't exist in the lookup table.
df <- data.frame(Feats = matrix(rnorm(10), nrow = 5, ncol = 5), ID = sample.int(10, 10))
dummy <- df[1, ]
dummy$ID <- 12
df <- rbind(dummy, df)
id_list <- data.frame(ID = sample.int(10,10),
Display = sample(c('clear', 'blur'), 10, replace = TRUE),
Type = sample(c('red', 'green', 'blue', 'indigo', 'yellow'), 10, replace = TRUE))
使用 merge
,您可以将 by
设置为两个要连接的数据帧的列名,或者将 by.x
和 by.y
(如果它们具有不同的名称). all.x = T
会将所有观察值保留在第一个数据帧中,即使它们与第二个数据帧中的观察值不匹配.
With merge
, you set either by
as the column name from both data frames to join by, or by.x
and by.y
if they have different names. all.x = T
will keep all observations in the first data frame even if they don't match an observation in the second data frame.
merged1 <- merge(df, id_list, by = "ID", sort = F, all.x = T)
merged1
#> ID Feats.1 Feats.2 Feats.3 Feats.4 Feats.5 Display
#> 1 10 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 clear
#> 2 5 0.99220217 -0.3125813 0.99220217 -0.3125813 0.99220217 clear
#> 3 2 1.03881289 1.1277627 1.03881289 1.1277627 1.03881289 clear
#> 4 7 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186 clear
#> 5 4 0.07130125 1.1715833 0.07130125 1.1715833 0.07130125 clear
#> 6 6 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 clear
#> 7 8 0.99220217 -0.3125813 0.99220217 -0.3125813 0.99220217 blur
#> 8 3 1.03881289 1.1277627 1.03881289 1.1277627 1.03881289 clear
#> 9 1 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186 clear
#> 10 9 0.07130125 1.1715833 0.07130125 1.1715833 0.07130125 clear
#> 11 12 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 <NA>
#> Type
#> 1 indigo
#> 2 yellow
#> 3 blue
#> 4 indigo
#> 5 yellow
#> 6 indigo
#> 7 green
#> 8 red
#> 9 red
#> 10 blue
#> 11 <NA>
dplyr :: left_join
保留来自第一个数据帧的所有观察结果,并合并来自第二个数据帧的所有匹配结果.
dplyr::left_join
keeps all observations from the first data frame and merges in any matching ones from the second.
joined <- dplyr::left_join(df, id_list, by = "ID")
head(joined)
#> Feats.1 Feats.2 Feats.3 Feats.4 Feats.5 ID Display
#> 1 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 12 <NA>
#> 2 -1.44053344 1.0086988 -1.44053344 1.0086988 -1.44053344 10 clear
#> 3 0.99220217 -0.3125813 0.99220217 -0.3125813 0.99220217 5 clear
#> 4 1.03881289 1.1277627 1.03881289 1.1277627 1.03881289 2 clear
#> 5 -0.01678186 -0.1519029 -0.01678186 -0.1519029 -0.01678186 7 clear
#> 6 0.07130125 1.1715833 0.07130125 1.1715833 0.07130125 4 clear
#> Type
#> 1 <NA>
#> 2 indigo
#> 3 yellow
#> 4 blue
#> 5 indigo
#> 6 yellow
由 reprex程序包(v0.2.0)创建于2018-07-13.
Created on 2018-07-13 by the reprex package (v0.2.0).
这篇关于从R中的另一个数据框中查找值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!