比较两个数据帧的值并进行合并 [英] Compare values from two dataframes and merge
本文介绍了比较两个数据帧的值并进行合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在R中使用两个数据框:
df1 = data.frame(c(A, B),c(1,21),c(17,29))
colnames(df1)= c(location,start,stop)
df1
位置开始停止
A 1 17
B 21 29
df2 = data.frame(c(A,A,A, A,B),c(1,10,20,40,20),c(10,20,30,50,30),c(x1,x2,x4 ,x3))
colnames(df2)= c(location,start,stop,out)
df2
位置开始停止out
A 1 10 x1
A 10 20 x2
A 20 30 x4
A 40 50 x5
B 20 30 x3
现在我想检查df1的每一行:
- 之间的'location' 价值在于响e从df2开始和停止,那么df2中相应的out值应该粘贴到df1中的新列中
如果'start'值在df2的起始和终止范围内,或者如果'end'的值为'end',那么'位置'与df2
在这个例子的情况下输出如何看待
df1_new
位置开始停止
A 1 17 x1,x2
B 21 29 x3
我已经开始在R中,但我被困在我需要查看df2的完整数据框的位置。
for(i in nrow(df1)){
/ pre>
if(df1 $ location [i] == df2 $ location#需要在df2的完整数据框中查找匹配项。我不知道如何做这个
& if(df1 $ start [i]%in%#需要检查起始值是否位于df2 $ start& df2 $ end
$
解决方案这是一个data.table方法,使用
foverlaps
library(data.table)
setkey(setDT(df1))
setDT(df2,key = name(df1))
foverlaps(df1,df2)[,。(out = toString(out)),by = location]
#location out
#1:A x1,x2
#2:B x3
你可以得到其他
foverlaps(df1, df2)
#位置开始停止i.start i.stop
#1:A 1 10 x1 1 17
#2:A 10 20 x2 1 17
#3: B 20 30 x3 21 29
I'm working with two dataframes in R:
df1 = data.frame(c("A", "B"), c(1, 21), c(17, 29)) colnames(df1) = c("location", "start", "stop") df1 location start stop A 1 17 B 21 29 df2 = data.frame(c("A", "A", "A", "A", "B"), c(1, 10, 20, 40, 20), c(10, 20, 30, 50, 30), c("x1", "x2","x4", "x5", "x3")) colnames(df2) = c("location", "start", "stop", "out") df2 location start stop out A 1 10 x1 A 10 20 x2 A 20 30 x4 A 40 50 x5 B 20 30 x3
Now I want to check for each row of df1:
- is there a match between 'location' with a 'location' from df2
- if the 'start' value is in the range of start and stop from df2 or if the 'end' value is in the range of start and stop from df2, then the corresponding 'out' value from df2 should be pasted in a new column in df1
This is how the output would look in the case of this example
df1_new
location start stop out
A 1 17 x1,x2
B 21 29 x3
I've started in R, but I'm stuck at the point where I need to look in the complete dataframe of df2
for (i in nrow(df1)) {
if(df1$location[i] == df2$location # it needs to look for a match in the complete dataframe of df2. I don't know how to do this
& if (df1$start[i] %in% # it needs to check if the start value lies in the range between df2$start & df2$end
}
解决方案
Here's a data.table way, using foverlaps
:
library(data.table)
setkey(setDT(df1))
setDT(df2, key = names(df1))
foverlaps(df1, df2)[, .(out = toString(out)), by=location]
# location out
# 1: A x1, x2
# 2: B x3
You can get other cols out of the foverlaps
results if desired:
foverlaps(df1, df2)
# location start stop out i.start i.stop
# 1: A 1 10 x1 1 17
# 2: A 10 20 x2 1 17
# 3: B 20 30 x3 21 29
这篇关于比较两个数据帧的值并进行合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文