根据首次注册和最近值转换 NA 值 [英] Transform NA values based on first registration and nearest values
问题描述
我已经提出了类似的问题,但现在我只想限制 NA 的新值.
I already made a similar question but now I want just to restrict the new values of NA.
我有一些这样的数据:
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6
A NA 0.1 0.2 NA 0.3 0.2
B 0.1 NA NA 0.3 0.2 0.1
C NA NA NA NA 0.3 NA
D 0.1 0.2 0.3 NA 0.1 NA
E NA NA 0.1 0.2 0.1 0.3
我想根据注册值的第一个日期更改我的数据的 NA 值.因此,例如对于 A,第一次注册是日期 2.然后我希望在注册之前 A 中的 NA 值为 0,并且在第一次注册之后,NA 的值成为最近值的平均值(日期 3 的平均值)和 5).
I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).
如果最后一个值是 NA,请将其转换为最后注册的值(如 C 和 D).在 E 的情况下,所有 NA 值都将变为 0.
In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.
得到这样的东西:
Date 1 Date 2 Date 3 Date 4 Date 5 Date 6
A 0 0.1 0.2 0.25 0.3 0.2
B 0.1 0.2 0.2 0.3 0.2 0.1
C 0 0 0 0 0.3 0.3
D 0.1 0.2 0.3 0.2 0.1 0.1
E 0 0 0.1 0.2 0.1 0.3
你能帮我吗?我不知道如何在 R 中做到这一点.
Can you help me? I'm not sure how to do it in R.
推荐答案
这里是使用 zoo
包中的 na.approx
和 apply
的方法code> 与 MARGIN = 1
(所以这可能不是很有效,但可以完成工作).
Here is a way using na.approx
from the zoo
package and apply
with MARGIN = 1
(so this is probably not very efficient but get's the job done).
library(zoo)
df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE)))
结果是
df1
# V1 V2 V3 V4 V5
#A NA 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C NA NA NA NA 0.3
#E NA NA 0.1 0.20 0.1
替换 NA
并重命名列.
df1[is.na(df1)] <- 0
names(df1) <- names(dat)
df1
# Date_1 Date_2 Date_3 Date_4 Date_5
#A 0.0 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C 0.0 0.0 0.0 0.00 0.3
#E 0.0 0.0 0.1 0.20 0.1
<小时>
说明
给定一个向量
x <- c(0.1, NA, NA, 0.3, 0.2)
na.approx(x)
返回带有线性插值的 x
returns x
with linear interpolated values
#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000
但是 OP 要求常量值,因此我们需要 approx
函数中的参数 method = "constant"
.
But OP asked for constant values so we need the argument method = "constant"
from the approx
function.
na.approx(x, method = "constant")
# [1] 0.1 0.1 0.1 0.3 0.2
但这仍然不是 OP 所要求的,因为它向前推进了最后一次观察,而您想要最接近的非 NA
值的平均值.因此我们需要参数 f
(也来自 approx
)
But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-NA
values. Therefore we need the argument f
(also from approx
)
na.approx(x, method = "constant", f = .5)
# [1] 0.1 0.2 0.2 0.3 0.2 # looks good
来自 ?approx
f : for method = "constant" 一个介于 0 和 1 之间的数字,表示左右连续阶跃函数之间的折衷.如果 y0 和 y1 是点的左侧和右侧的值,则如果 f == 0,则值为 y0,如果 f == 1,则值为 y1,对于中间值,则值为 y0*(1-f)+y1*f.这样,即使对于非有限的 y 值,f == 0 的结果也是右连续的,f == 1 的结果是左连续的.
f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.
最后,如果我们不想替换每一行开头和结尾的NA
,我们需要na.rm = FALSE
.
Lastly, if we don't want to replace the NA
s at the beginning and end of each row we need na.rm = FALSE
.
来自?na.approx
na.rm :逻辑.如果(样条)插值的结果仍然导致 NA,是否应该删除这些?
na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?
数据
dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA,
NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA,
0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2",
"Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A",
"B", "C", "E"))
编辑
如果最后一列有NA
,我们可以在应用na.approx
之前用最后一个非NA
替换它们如上图.
If there are NA
s in the last column we can replace these with the last non-NA
s before we apply na.approx
as shown above.
dat$Date_6[is.na(dat$Date_6)] <- dat[cbind(1:nrow(dat),
max.col(!is.na(dat), ties.method = "last"))][is.na(dat$Date_6)]
这篇关于根据首次注册和最近值转换 NA 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!