na.locf填充NAs直到maxgap,即使gap> maxgap,与组 [英] na.locf fill NAs up to maxgap even if gap > maxgap, with groups
问题描述
我见过一个解决方案,但不能让它为组工作
(在时间序列中只填充有限的数字),并认为还有一个更简洁的方法来做到这一点?
I've seen a solution to this, but can't get it to work for groups (Fill NA in a time series only to a limited number), and thought there must be a neater way to do this also?
说我有以下dt:
dt <- data.table(ID = c(rep("A", 10), rep("B", 10)), Price = c(seq(1, 10, 1), seq(11, 20, 1)))
dt[c(1:2, 5:10), 2] <- NA
dt[c(11:13, 15:19) ,2] <- NA
dt
ID Price
1: A NA
2: A NA
3: A 3
4: A 4
5: A NA
6: A NA
7: A NA
8: A NA
9: A NA
10: A NA
11: B NA
12: B NA
13: B NA
14: B 14
15: B NA
16: B NA
17: B NA
18: B NA
19: B NA
20: B 20
我想做什么,是从最近的非 - NA
值填充 NA
的两者 ,但最多只能向前或向后两行。
What I would like to do, is to fill NA
s both forward and back from the most recent non-NA
value, but only up to a maximum of two rows forward or back.
我还需要按组(ID)完成。
I also need it to be done by group (ID).
我已尝试使用 na.locf
/ na.approx
> maxgap = x 等,但它不填充 NA
s,其中非 - NA
值大于 maxgap
。而我想填充这些前进和后退,即使非 - NA
值之间的差距大于 maxgap
,但只有两行。
I have tried using na.locf
/na.approx
with maxgap = x
etc, but it does not fill NA
s where the gap between non-NA
values is greater than maxgap
. Whereas I want to fill these forward and back even if the gap between non-NA
values is greater than maxgap
, but only by two rows.
最终结果应该类似:
ID Price Price_Fill
1: A NA 3
2: A NA 3
3: A 3 3
4: A 4 4
5: A NA 4
6: A NA 4
7: A NA NA
8: A NA NA
9: A NA NA
10: A NA NA
11: B NA NA
12: B NA 14
13: B NA 14
14: B 14 14
15: B NA 14
16: B NA 14
17: B NA NA
18: B NA 20
19: B NA 20
20: B 20 20
在现实中,我的数据集非常庞大,我想要能够在 NA
In reality, my data set is massive, and I want to be able to fill NA
s forward and back for up to 672 rows, but no more, by group.
推荐答案
对于显示的示例,我们按ID分组,用 n = 0:2获得'Price'的
shift
code>和类型
作为'lead'创建3个临时列,从中获取 pmax
使用输出做 shift
和 type ='lag'
(默认情况下是'lag' n
,获取 pmin
并将其指定为Price_Fill
For the example showed, we group by 'ID', get the shift
of 'Price' with n = 0:2
, and type
as 'lead' to create 3 temporary columns, get the pmax
from this, use the output to do the shift
with type = 'lag'
(by default it is 'lag') and same n
, get the pmin
and assign it as 'Price_Fill'
dt[, Price_Fill := do.call(pmin, c(shift(do.call(pmax, c(shift(Price, n = 0:2,
type = "lead"), na.rm=TRUE)), n= 0:2), na.rm = TRUE)) , by = ID]
dt
# ID Price Price_Fill
#1: A NA 3
#2: A NA 3
#3: A 3 3
#4: A 4 4
#5: A NA 4
#6: A NA 4
#7: A NA NA
#8: A NA NA
#9: A NA NA
#10: A NA NA
#11: B NA NA
#12: B NA 14
#13: B NA 14
#14: B 14 14
#15: B NA 14
#16: B NA 14
#17: B NA NA
#18: B NA 20
#19: B NA 20
#20: B 20 20
更通用的方法是执行 pmin / pmax
在 .I
因为'价格'可以不同,而不是OP的帖子中显示的序列号。 p>
A more general approach would be to do the pmin/pmax
on .I
as the 'Price' can be different and not the sequence number as showed in the OP's post.
i1 <- dt[, do.call(pmin, c(shift(do.call(pmax, c(shift(NA^(is.na(Price))*
.I, n = 0:2, type = "lead"), na.rm = TRUE)), n = 0:2), na.rm = TRUE)), ID]$V1
dt$Price_Fill < dt$Price[i1]
dt$Price_Fill
#[1] 3 3 3 4 4 4 NA NA NA NA NA 14 14 14 14 14 NA 20 20 20
ie假设我们改变'价格',它将是不同的
i.e. suppose we change the 'Price', it will be different
dt$Price[3] <- 10
dt$Price[14] <- 7
dt$Price_Fill <- dt$Price[i1]
dt$Price_Fill
#[1] 10 10 10 4 4 4 NA NA NA NA NA 7 7 7 7 7 NA 20 20 20
这篇关于na.locf填充NAs直到maxgap,即使gap> maxgap,与组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!