在R中向后搜索向量/数据表 [英] Searching a vector/data table backwards in R

查看:102
本文介绍了在R中向后搜索向量/数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我有一个非常大的数据框/数据表,我想在列中搜索小于我当前索引位置的第一个且最接近的NA值.

Basically, I have a very large data frame/data table and I would like to search a column for the first, and closest, NA value which is less than my current index position.

例如,假设我有一个数据帧DF如下:

For example, let's say I have a data frame DF as follows:

INDEX | KEY   |   ITEM
----------------------
 1    |  10   |    AAA
 2    |  12   |    AAA
 3    |  NA   |    AAA
 4    |  18   |    AAA
 5    |  NA   |    AAA
 6    |  24   |    AAA
 7    |  29   |    AAA
 8    |  31   |    AAA
 9    |  34   |    AAA

在此数据帧中,我们在索引3和索引5处都有一个NA值.现在,我们假设我们从索引8(其KEY为31)开始.我想向后搜索列KEY,以便它在找到NA的第一个实例的时刻停止搜索,并返回NA值的索引.

From this data frame we have an NA value at index 3 and at index 5. Now, let's say we start at index 8 (which has KEY of 31). I would like to search the column KEY backwards such that the moment it finds the first instance of NA the search stops, and the index of the NA value is returned.

我知道有一些方法可以找到向量/列中的所有NA值(例如,我可以使用which(is.na(x))返回具有NA的索引值),但是由于数据的绝对大小我正在工作的框架,由于需要执行大量的迭代,因此这是一种效率很低的方法.我想做的一种方法是创建一种"do while"循环,它似乎确实起作用,但这似乎效率很低,因为它每次都需要执行计算(并且考虑到我需要进行100,000次以上的迭代,因此看起来不是一个好主意).

I know there are ways to find all NA values in a vector/column (for example, I can use which(is.na(x)) to return the index values which have NA) but due to the sheer size of the data frame I am working and due to the large number of iterations that need to be performed this is a very inefficient way of doing it. One method I thought of doing is creating a kind of "do while" loop and it does seem to work, but this again seems quite inefficient since it needs to perform calculations each time (and given that I need to do over 100,000 iterations this does not look like a good idea).

是否存在一种从特定索引向后搜索列的快速方法,以便可以找到最接近的NA值的索引?

Is there a fast way of searching a column backwards from a particular index such that I can find the index of the closest NA value?

推荐答案

为什么不一次对NA索引进行前向填充,以便将来可以查找任何行的最新NA:

Why not do a forward-fill of the NA indexes once, so that you can then look up the most recent NA for any row in future:

library(dplyr)
library(tidyr)

df = df %>%
    mutate(last_missing = if_else(is.na(KEY), INDEX, as.integer(NA))) %>%
    fill(last_missing)

输出:

> df
  INDEX KEY ITEM last_missing
1     1  10  AAA           NA
2     2  12  AAA           NA
3     3  NA  AAA            3
4     4  18  AAA            3
5     5  NA  AAA            5
6     6  24  AAA            5
7     7  29  AAA            5
8     8  31  AAA            5
9     9  34  AAA            5

现在,您每次需要给定行的答案时,都无需重新计算.进行前向填充可能有更有效的方法,但我认为探索这些方法比弄清楚如何优化后向搜索要容易得多.

Now there's no need to recalculate every time you need the answer for a given row. There may be more efficient ways to do the forward fill, but I think exploring those is easier than figuring out how to optimise the backward search.

这篇关于在R中向后搜索向量/数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆