比较两个数据帧的值并进行合并 [英] Compare values from two dataframes and merge

查看:234
本文介绍了比较两个数据帧的值并进行合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中使用两个数据框:

  df1 = data.frame(c(A, B),c(1,21),c(17,29))
colnames(df1)= c(location,start,stop)

df1
位置开始停止
A 1 17
B 21 29

df2 = data.frame(c(A,A,A, A,B),c(1,10,20,40,20),c(10,20,30,50,30),c(x1,x2,x4 ,x3))
colnames(df2)= c(location,start,stop,out)

df2
位置开始停止out
A 1 10 x1
A 10 20 x2
A 20 30 x4
A 40 50 x5
B 20 30 x3

现在我想检查df1的每一行:




    如果'start'值在df2的起始和终止范围内,或者如果'end'的值为'end',那么'位置'与df2
  • 之间的'location' 价值在于响e从df2开始和停止,那么df2中相应的out值应该粘贴到df1中的新列中



在这个例子的情况下输出如何看待

  df1_new 

位置开始停止
A 1 17 x1,x2
B 21 29 x3

我已经开始在R中,但我被困在我需要查看df2的完整数据框的位置。

  for(i in nrow(df1)){
if(df1 $ location [i] == df2 $ location#需要在df2的完整数据框中查找匹配项。我不知道如何做这个
& if(df1 $ start [i]%in%#需要检查起始值是否位于df2 $ start& df2 $ end
$
/ pre>

解决方案

这是一个data.table方法,使用 foverlaps

  library(data.table)
setkey(setDT(df1))
setDT(df2,key = name(df1))

foverlaps(df1,df2)[,。(out = toString(out)),by = location]

#location out
#1:A x1,x2
#2:B x3

你可以得到其他

  foverlaps(df1, df2)
#位置开始停止i.start i.stop
#1:A 1 10 x1 1 17
#2:A 10 20 x2 1 17
#3: B 20 30 x3 21 29


I'm working with two dataframes in R:

df1 = data.frame(c("A", "B"), c(1, 21), c(17, 29))
colnames(df1) = c("location", "start", "stop")

df1
location    start    stop
A           1        17
B           21       29

df2 = data.frame(c("A", "A", "A", "A", "B"), c(1, 10, 20, 40, 20), c(10, 20, 30, 50, 30), c("x1", "x2","x4", "x5", "x3"))
colnames(df2) = c("location", "start", "stop", "out")

df2
location    start    stop   out
A           1        10     x1
A           10       20     x2
A           20       30     x4
A           40       50     x5  
B           20       30     x3

Now I want to check for each row of df1:

  • is there a match between 'location' with a 'location' from df2
  • if the 'start' value is in the range of start and stop from df2 or if the 'end' value is in the range of start and stop from df2, then the corresponding 'out' value from df2 should be pasted in a new column in df1

This is how the output would look in the case of this example

df1_new

location    start    stop    out
A           1        17      x1,x2
B           21       29      x3

I've started in R, but I'm stuck at the point where I need to look in the complete dataframe of df2

for (i in nrow(df1)) {
   if(df1$location[i] == df2$location # it needs to look for a match in the complete dataframe of df2. I don't know how to do this
   & if (df1$start[i] %in% # it needs to check if the start value lies in the range between df2$start & df2$end
}

解决方案

Here's a data.table way, using foverlaps:

library(data.table)
setkey(setDT(df1))
setDT(df2, key = names(df1))

foverlaps(df1, df2)[, .(out = toString(out)), by=location]

#    location    out
# 1:        A x1, x2
# 2:        B     x3

You can get other cols out of the foverlaps results if desired:

foverlaps(df1, df2)
#    location start stop out i.start i.stop
# 1:        A     1   10  x1       1     17
# 2:        A    10   20  x2       1     17
# 3:        B    20   30  x3      21     29

这篇关于比较两个数据帧的值并进行合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆