数据帧查找值在范围内并返回不同的列 [英] data frame lookup value in range and return different column

查看:108
本文介绍了数据帧查找值在范围内并返回不同的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,希望使用一个值( DF1 $ pos )来搜索DF2(DF2start,DF2end)中的两列,如果它下降在这些数字内,返回 DF2 $ name



DF1

  ID pos name 
chr 12
chr 542
chr 674

DF2

  ID开始结束注释
chr 1 200 a1
chr 201 432 a2
chr 540 1002 a3
chr 2000 2004 a4

所以在这个例子中,我希望DF1成为

  ID pos name 
chr 12 a1
chr 542 a3
chr 674 a3
pre>

我尝试使用合并和相交,但不知道如何使用如果语句具有逻辑表达式



数据框应如下编码,

  DF1<  -  data.frame(ID = c(chr,chr,chr),
pos = c(12,542,672),
name = c(NA,NA,NA))

DF2 < - data.frame(ID = c(chr,chr,chr,chr),
start = c(1,201,540,200),
end = c(200,432,1002,2004),
annot = c(a1,a2,a3,a4))


解决方案

也许您可以使用 foverlaps $

  library(data.table)
DT1< - 数据表(DF1)
DT2 < - data.table(DF2)
setkey(DT2,ID,开始,结束)
DT1 [,c(开始,结束):= pos] ##我不知道这个步骤有没有办法...
foverlaps(DT1,DT2)
#ID开始结束注释pos i.start i.end
#1:chr 1 200 a1 12 12 12
#2:chr 540 1002 a3 542 542 542
#3:chr 540 1002 a3 674 674 674
foverlaps(DT1,DT2 )[,c(ID,pos,注释),with = FALSE ]
#ID pos注释
#1:chr 12 a1
#2:chr 542 a3
#3:chr 674 a3






正如@Arun在评论中所提到的那样,你也可以使用 foverlaps 中提取相关值的TRUE

  foverlaps(DT1,DT2,which = TRUE)
#xid yid
#1:1 1
#2:2 3
#3:3 3
DT2 $ annot [foverlaps(DT1,DT2,which = TRUE)$ yid]
#[1]a1a3a3


I have two data frames and wish to use the value in one (DF1$pos) to search through two columns in DF2 (DF2start, DF2end) and if it falls within those numbers, return DF2$name

DF1

ID   pos  name
chr   12
chr  542
chr  674

DF2

ID   start   end   annot
chr      1   200      a1
chr    201   432      a2
chr    540  1002      a3
chr   2000  2004      a4

so in this example I would like DF1 to become

ID   pos  name
chr   12    a1
chr  542    a3
chr  674    a3

I have tried using merge and intersect but do not know how to use an if statement with a logical expression in them.

The data frames should be coded as follows,

DF1  <- data.frame(ID=c("chr","chr","chr"),
               pos=c(12,542,672),
               name=c(NA,NA,NA))

DF2  <- data.frame(ID=c("chr","chr","chr","chr"),
               start=c(1,201,540,200),
               end=c(200,432,1002,2004),
               annot=c("a1","a2","a3","a4"))

解决方案

Perhaps you can use foverlaps from the "data.table" package.

library(data.table)
DT1 <- data.table(DF1)
DT2 <- data.table(DF2)
setkey(DT2, ID, start, end)
DT1[, c("start", "end") := pos]  ## I don't know if there's a way around this step...
foverlaps(DT1, DT2)
#     ID start  end annot pos i.start i.end
# 1: chr     1  200    a1  12      12    12
# 2: chr   540 1002    a3 542     542   542
# 3: chr   540 1002    a3 674     674   674
foverlaps(DT1, DT2)[, c("ID", "pos", "annot"), with = FALSE]
#     ID pos annot
# 1: chr  12    a1
# 2: chr 542    a3
# 3: chr 674    a3


As mentioned by @Arun in the comments, you can also use which = TRUE in foverlaps to extract the relevant values:

foverlaps(DT1, DT2, which = TRUE)
#    xid yid
# 1:   1   1
# 2:   2   3
# 3:   3   3
DT2$annot[foverlaps(DT1, DT2, which = TRUE)$yid]
# [1] "a1" "a3" "a3"

这篇关于数据帧查找值在范围内并返回不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆