检查向量中的值是否在不同长度向量中的值范围内 [英] Checking if value in vector is in range of values in different length vector
问题描述
所以我正在R中工作,并且有一个大数据框,其中包含一个载体,该载体的基因组位置如下:
So I'm working in R and have a large dataframe that contains a vector that has genome positions like such:
2655180
2657176
2658869
第二个数据框具有一定范围的位置和类似这样的基因:
And a second dataframe that has a a range of positions and a gene like such:
chr1 100088228 100162167 AGL
chr1 107599438 107600565 PRMT6
chr1 115215635 115238091 AMPD1
chr1 11850637 11863073 MTHFR
chr1 119958143 119965343 HSD3B2
chr1 144124628 144128902 HFE2
chr1 150769175 150779181 CTSK
chr1 154245300 154248277 HAX1
chr1 155204686 155210803 GBA
chr1 156084810 156108997 LMNA
第二列和第三列分别是基因的开头和结尾.我想做的是检查第一个数据帧中的行是否适合第二个数据帧的范围,如果是,则将基因(第二个数据帧的第4列)添加到第一个数据帧.
Where the second and third columns are the start and end of the gene respectively. What I want to do is check if a row in the first data frame fits within the range of the second data frame and if so add the gene (column 4 of the second data frame) to the first data frame.
我当前的实现使用嵌套的for循环来检查第一个数据帧中的每个条目与第二个数据帧中的所有条目.是否有任何R函数可以帮助我完成此任务?
My current implementation uses nested for loops to check each entry in the first dataframe against all entries in the second dataframe. Are there any R functions that could help me with accomplishing this task?
简而言之:我需要检查第一个向量中一行的值是否在不同大小的第二个向量中指定的范围内,然后从第二个向量中提取一个值. >
推荐答案
使用dplyr
:
getValue <- function(x, data) {
tmp <- data %>%
filter(V2 <= x, x <= V3)
return(tmp$V4)
}
x <- c(107599440, 150769180, 155204690)
sapply(x, getValue, data=df)
哪个返回:
[1] "PRMT6" "CTSK" "GBA"
注意:我已将您的数据复制到具有列名称V1
,V2
,V3
和V4
的数据框df
中. V2
和V3
列是该范围的下限值和上限值.
Note: I copied your data into a dataframe df
that has column names V1
, V2
, V3
, and V4
. The columns V2
and V3
are the lower and upper values of the range.
df <- read.table(text="chr1 100088228 100162167 AGL
chr1 107599438 107600565 PRMT6
chr1 115215635 115238091 AMPD1
chr1 11850637 11863073 MTHFR
chr1 119958143 119965343 HSD3B2
chr1 144124628 144128902 HFE2
chr1 150769175 150779181 CTSK
chr1 154245300 154248277 HAX1
chr1 155204686 155210803 GBA
chr1 156084810 156108997 LMNA", stringsAsFactors=FALSE)
更新:
如果有多个匹配项,则将返回第一个匹配项:
In case of multiple matches, this will return the first match:
getValue <- function(x, data) {
tmp <- data %>%
filter(V2 <= x, x <= V3) %>%
filter(row_number() == 1)
return(tmp$V4)
}
有多个排名功能.查看?row_number
了解更多信息.
There are multiple ranking functions. Check out ?row_number
for more info.
这篇关于检查向量中的值是否在不同长度向量中的值范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!