查找数据框中每个元素所属的区间行 [英] Find which interval row in a data frame that each element of a vector belongs in
问题描述
以下是一些示例数据:
#查找矢量的每个元素的哪个间隔在
库(tidyverse)
元素< - c(0.1,0.2,0.5,0.9 ,1.1,1.9,2.1)
间隔< - frame_data(〜phase,〜start,〜end,
a,0,0.5,
b 1,1.9,
c,2,2.5)
那些反对tidyverse的人:
元素< - c(0.1,0.2,0.5,0.9,1.1,1.9,2.1 )
间隔< - 结构(list(phase = c(a,b,c),
start = c(0,1,2) b $ b end = c(0.5,1.9,2.5)),
.Names = c(phase,start,end),
row.name s = c(NA,-3L),
class =data.frame)
这里有一种方法:
库(intrval)
phases_for_elements< -
map元素,〜.x%[]%data.frame(spacing [,c('start','end')]))%>%
map(。,〜unlist(spacing [.x,阶段']))
以下是输出:
[[1]]
阶段
a
[[2]]
阶段
a
[[3]]
阶段
a
[[4]]
字符(0)
[[5]]
阶段
b
[[6]]
阶段
b
[[7]]
阶段
c
但是,我正在寻找一种更简单的方法,打字较少。我在相关问题中看到 findInterval
,但我不知道在这种情况下如何使用它。
这是一个可能的解决方案,使用新的非Equi 连接 data.table
(v> = 1.9.8)。虽然我怀疑你会喜欢语法,但它应该是非常有效的解决方案。
此外,关于 findInterval
,此功能假定您的间隔的连续性,而这不是这种情况,所以我怀疑有一个直接的解决方案使用它。
库(data.table)#v1.10.0
setDT (间隔)[data.table(elements),on =。(start< = elements,end> = elements)]
#phase start end
#1:a 0.1 0.1
#2:a 0.2 0.2
#3:a 0.5 0.5
#4:NA 0.9 0.9
#5:b 1.1 1.1
#6:b 1.9 1.9
#7:c 2.1 2.1
关于上面的代码,我觉得很简单:加入间隔
和元素
通过操作符中指定的条件。几乎是这样
这里有一些注意事项,开始
,结束
和元素
应该是一样的,所以如果其中一个是 integer
,那么应该被转换至 numeric
。
I have a vector of numeric elements, and a dataframe with two columns that define the start and end points of intervals. Each row in the dataframe is one interval. I want to find out which interval each element in the vector belongs to.
Here's some example data:
# Find which interval that each element of the vector belongs in
library(tidyverse)
elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)
intervals <- frame_data(~phase, ~start, ~end,
"a", 0, 0.5,
"b", 1, 1.9,
"c", 2, 2.5)
The same example data for those who object to the tidyverse:
elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)
intervals <- structure(list(phase = c("a", "b", "c"),
start = c(0, 1, 2),
end = c(0.5, 1.9, 2.5)),
.Names = c("phase", "start", "end"),
row.names = c(NA, -3L),
class = "data.frame")
Here's one way to do it:
library(intrval)
phases_for_elements <-
map(elements, ~.x %[]% data.frame(intervals[, c('start', 'end')])) %>%
map(., ~unlist(intervals[.x, 'phase']))
Here's the output:
[[1]]
phase
"a"
[[2]]
phase
"a"
[[3]]
phase
"a"
[[4]]
character(0)
[[5]]
phase
"b"
[[6]]
phase
"b"
[[7]]
phase
"c"
But I'm looking for a simpler method with less typing. I've seen findInterval
in related questions, but I'm not sure how I can use it in this situation.
Here's a possible solution using the new "non-equi" joins in data.table
(v>=1.9.8). While I doubt you'll like the syntax, it should be very efficient soluion.
Also, regarding findInterval
, this function assumes continuity in your intervals, while this isn't the case here, so I doubt there is a straightforward solution using it.
library(data.table) #v1.10.0
setDT(intervals)[data.table(elements), on = .(start <= elements, end >= elements)]
# phase start end
# 1: a 0.1 0.1
# 2: a 0.2 0.2
# 3: a 0.5 0.5
# 4: NA 0.9 0.9
# 5: b 1.1 1.1
# 6: b 1.9 1.9
# 7: c 2.1 2.1
Regarding the above code, I find it pretty self-explanatory: Join intervals
and elements
by the condition specified in the on
operator. That's pretty much it.
There is a certain caveat here though, start
, end
and elements
should be all of the same type, so if one of them is integer
, it should be converted to numeric
first.
这篇关于查找数据框中每个元素所属的区间行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!