查找数据框中每个元素所属的区间行 [英] Find which interval row in a data frame that each element of a vector belongs in

查看:151
本文介绍了查找数据框中每个元素所属的区间行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数字元素的向量,以及一个数据框,它有两列,定义了间隔的起点和终点。数据帧中的每一行都是一个间隔。我想知道向量属性中的每个元素的间隔。



以下是一些示例数据:

 #查找矢量的每个元素的哪个间隔在

库(tidyverse)
元素< - c(0.1,0.2,0.5,0.9 ,1.1,1.9,2.1)

间隔< - frame_data(〜phase,〜start,〜end,
a,0,0.5,
b 1,1.9,
c,2,2.5)

那些反对tidyverse的人:

 元素< -  c(0.1,0.2,0.5,0.9,1.1,1.9,2.1 )

间隔< - 结构(list(phase = c(a,b,c),
start = c(0,1,2) b $ b end = c(0.5,1.9,2.5)),
.Names = c(phase,start,end),
row.name s = c(NA,-3L),
class =data.frame)

这里有一种方法:

 库(intrval)
phases_for_elements< -
map元素,〜.x%[]%data.frame(spacing [,c('start','end')]))%>%
map(。,〜unlist(spacing [.x,阶段']))

以下是输出:

  [[1]] 
阶段
a

[[2]]
阶段
a

[[3]]
阶段
a

[[4]]
字符(0)

[[5]]
阶段
b

[[6]]
阶段
b

[[7]]
阶段
c

但是,我正在寻找一种更简单的方法,打字较少。我在相关问题中看到 findInterval ,但我不知道在这种情况下如何使用它。

解决方案

这是一个可能的解决方案,使用新的非Equi 连接 data.table (v> = 1.9.8)。虽然我怀疑你会喜欢语法,但它应该是非常有效的解决方案。



此外,关于 findInterval ,此功能假定您的间隔的连续性,而这不是这种情况,所以我怀疑有一个直接的解决方案使用它。

 库(data.table)#v1.10.0 
setDT (间隔)[data.table(elements),on =。(start< = elements,end> = elements)]
#phase start end
#1:a 0.1 0.1
#2:a 0.2 0.2
#3:a 0.5 0.5
#4:NA 0.9 0.9
#5:b 1.1 1.1
#6:b 1.9 1.9
#7:c 2.1 2.1

关于上面的代码,我觉得很简单:加入间隔元素通过操作符中指定的条件。几乎是这样



这里有一些注意事项,开始结束元素应该是一样的,所以如果其中一个是 integer ,那么应该被转换至 numeric


I have a vector of numeric elements, and a dataframe with two columns that define the start and end points of intervals. Each row in the dataframe is one interval. I want to find out which interval each element in the vector belongs to.

Here's some example data:

# Find which interval that each element of the vector belongs in

    library(tidyverse)
    elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)

    intervals <-  frame_data(~phase, ~start, ~end,
                               "a",     0,     0.5,
                               "b",     1,     1.9,
                               "c",     2,     2.5)

The same example data for those who object to the tidyverse:

elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)

intervals <- structure(list(phase = c("a", "b", "c"), 
                            start = c(0, 1, 2), 
                            end = c(0.5, 1.9, 2.5)), 
                       .Names = c("phase", "start", "end"), 
                       row.names = c(NA, -3L), 
                       class = "data.frame")

Here's one way to do it:

    library(intrval) 
    phases_for_elements <- 
    map(elements, ~.x %[]% data.frame(intervals[, c('start', 'end')])) %>% 
      map(., ~unlist(intervals[.x, 'phase'])) 

Here's the output:

    [[1]]
    phase 
      "a" 

    [[2]]
    phase 
      "a" 

    [[3]]
    phase 
      "a" 

    [[4]]
    character(0)

    [[5]]
    phase 
      "b" 

    [[6]]
    phase 
      "b" 

    [[7]]
    phase 
      "c" 

But I'm looking for a simpler method with less typing. I've seen findInterval in related questions, but I'm not sure how I can use it in this situation.

解决方案

Here's a possible solution using the new "non-equi" joins in data.table (v>=1.9.8). While I doubt you'll like the syntax, it should be very efficient soluion.

Also, regarding findInterval, this function assumes continuity in your intervals, while this isn't the case here, so I doubt there is a straightforward solution using it.

library(data.table) #v1.10.0
setDT(intervals)[data.table(elements), on = .(start <= elements, end >= elements)]
#    phase start end
# 1:     a   0.1 0.1
# 2:     a   0.2 0.2
# 3:     a   0.5 0.5
# 4:    NA   0.9 0.9
# 5:     b   1.1 1.1
# 6:     b   1.9 1.9
# 7:     c   2.1 2.1

Regarding the above code, I find it pretty self-explanatory: Join intervals and elements by the condition specified in the on operator. That's pretty much it.

There is a certain caveat here though, start, end and elements should be all of the same type, so if one of them is integer, it should be converted to numeric first.

这篇关于查找数据框中每个元素所属的区间行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆