如何修剪R向量? [英] How to trim an R vector?

查看:23
本文介绍了如何修剪R向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下排序向量:

> v
 [1] -1  0  1  2  4  5  2  3  4  5  7  8  5  6  7  8 10 11

如何删除 -1、0 和 11 个条目而不遍历整个向量,无论是使用用户循环还是隐式使用语言关键字?也就是说,我想在每个 edge 和仅在每个边缘修剪向量,这样排序的序列在我的 min,max 参数 1 和 10 内.解决方案应该假设向量已排序避免检查每个元素.

How can I remove the -1, 0, and 11 entries without looping over the whole vector, either with a user loop or implicitly with a language keyword? That is, I want to trim the vector at each edge and only at each edge, such that the sorted sequence is within my min,max parameters 1 and 10. The solution should assume that the vector is sorted to avoid checking every element.

当我们想将向量中的项用作另一个对象中的索引时,这种解决方案可以在非常大的向量的向量化操作中派上用场.对于一个应用程序,请参阅此主题.

This kind of solutions can come handy in vectorized operations for very large vectors, when we want to use the items in the vector as indexes in another object. For one application see this thread.

推荐答案

之前的所有解决方案都隐式地检查向量的每个元素.正如@Robert Kubrick 指出的那样,这并没有利用向量已经排序的事实.

All of the previous solutions implicitly check every element of the vector. As @Robert Kubrick points out, this does not take advantage of the fact that the vector is already sorted.

为了利用向量的排序特性,您可以使用二进制搜索(通过findInterval)来查找开始和结束索引,而无需查看每个元素:

To take advantage of the sorted nature of the vector, you can use binary search (through findInterval) to find the start and end indexes without looking at every element:

n<-1e9
v<--3:(n+3)
system.time(a <- v [v>=1 & v <=n]) # 68 s
system.time(b <- v[do.call(seq,as.list(findInterval(c(1,n),v)))]) # 15s
identical(a,b) # TRUE

有点笨拙,有一些讨论findInterval 中的二分查找可能并不完全有效,但总体思路是存在的.

It is a little clumsy, and there is some discussion that the binary search in findInterval may not be entirely efficient, but the general idea is there.

正如评论中所指出的,上述仅当索引在向量中时才有效.这是我认为可行的功能:

As was pointed out in the comments, the above only works when the index is in the vector. Here is a function that I think will work:

in.range <- function(x, lo = -Inf, hi = +Inf) {
   lo.idx <- findInterval(lo, x, all.inside = TRUE)
   hi.idx <- findInterval(hi, x)
   lo.idx <- lo.idx + x[lo.idx] >= lo
   x[seq(lo.idx, hi.idx)]
}

system.time(b <- in.range(v, 1, n) # 15s

这篇关于如何修剪R向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆