根据首次注册和最近值转换 NA 值 [英] Transform NA values based on first registration and nearest values

查看:54
本文介绍了根据首次注册和最近值转换 NA 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经提出了类似的问题,但现在我只想限制 NA 的新值.

I already made a similar question but now I want just to restrict the new values of NA.

我有一些这样的数据:

Date 1   Date 2    Date 3    Date 4    Date 5   Date 6
A  NA       0.1       0.2       NA        0.3    0.2
B  0.1      NA        NA        0.3       0.2    0.1
C  NA       NA        NA        NA        0.3    NA
D  0.1      0.2       0.3       NA        0.1    NA
E  NA       NA        0.1       0.2       0.1    0.3

我想根据注册值的第一个日期更改我的数据的 NA 值.因此,例如对于 A,第一次注册是日期 2.然后我希望在注册之前 A 中的 NA 值为 0,并且在第一次注册之后,NA 的值成为最近值的平均值(日期 3 的平均值)和 5).

I would like to change the NA values of my data based on the first date a value is registered. So for example for A, the first registration is Date 2. Then I want that before that registration the values of NA in A are 0, and after the first registration the values of NA become the mean of the nearest values (mean of date 3 and 5).

如果最后一个值是 NA,请将其转换为最后注册的值(如 C 和 D).在 E 的情况下,所有 NA 值都将变为 0.

In case the last value is an NA, transform it into the last registered value (as in C and D). In the case of E all NA values will become 0.

得到这样的东西:

Date 1   Date 2    Date 3    Date 4    Date 5   Date 6 
A  0       0.1       0.2        0.25      0.3    0.2
B  0.1     0.2       0.2        0.3       0.2    0.1
C  0       0         0          0         0.3    0.3
D  0.1     0.2       0.3        0.2       0.1    0.1
E  0       0         0.1        0.2       0.1    0.3

你能帮我吗?我不知道如何在 R 中做到这一点.

Can you help me? I'm not sure how to do it in R.

推荐答案

这里是使用 zoo 包中的 na.approxapply 的方法code> 与 MARGIN = 1 (所以这可能不是很有效,但可以完成工作).

Here is a way using na.approx from the zoo package and apply with MARGIN = 1 (so this is probably not very efficient but get's the job done).

library(zoo)
df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE)))

结果是

df1
#   V1  V2  V3   V4  V5
#A  NA 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C  NA  NA  NA   NA 0.3
#E  NA  NA 0.1 0.20 0.1

替换 NA 并重命名列.

df1[is.na(df1)] <- 0
names(df1) <- names(dat)
df1
#  Date_1 Date_2 Date_3 Date_4 Date_5
#A    0.0    0.1    0.2   0.25    0.3
#B    0.1    0.2    0.2   0.30    0.2
#C    0.0    0.0    0.0   0.00    0.3
#E    0.0    0.0    0.1   0.20    0.1

<小时>

说明

给定一个向量

x <- c(0.1, NA, NA, 0.3, 0.2)
na.approx(x)

返回带有线性插值的 x

returns x with linear interpolated values

#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000

但是 OP 要求常量值,因此我们需要 approx 函数中的参数 method = "constant".

But OP asked for constant values so we need the argument method = "constant" from the approx function.

na.approx(x, method = "constant") 
# [1] 0.1 0.1 0.1 0.3 0.2

但这仍然不是 OP 所要求的,因为它向前推进了最后一次观察,而您想要最接近的非 NA 值的平均值.因此我们需要参数 f(也来自 approx)

But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-NA values. Therefore we need the argument f (also from approx)

na.approx(x, method = "constant", f = .5)
# [1] 0.1 0.2 0.2 0.3 0.2 # looks good

来自 ?approx

f : for method = "constant" 一个介于 0 和 1 之间的数字,表示左右连续阶跃函数之间的折衷.如果 y0 和 y1 是点的左侧和右侧的值,则如果 f == 0,则值为 y0,如果 f == 1,则值为 y1,对于中间值,则值为 y0*(1-f)+y1*f.这样,即使对于非有限的 y 值,f == 0 的结果也是右连续的,f == 1 的结果是左连续的.

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.

最后,如果我们不想替换每一行开头和结尾的NA,我们需要na.rm = FALSE.

Lastly, if we don't want to replace the NAs at the beginning and end of each row we need na.rm = FALSE.

来自?na.approx

na.rm :逻辑.如果(样条)插值的结果仍然导致 NA,是否应该删除这些?

na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

数据

dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA, 
NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA, 
0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2", 
"Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A", 
"B", "C", "E"))

编辑

如果最后一列有NA,我们可以在应用na.approx之前用最后一个非NA替换它们如上图.

If there are NAs in the last column we can replace these with the last non-NAs before we apply na.approx as shown above.

dat$Date_6[is.na(dat$Date_6)] <- dat[cbind(1:nrow(dat),
                                           max.col(!is.na(dat), ties.method = "last"))][is.na(dat$Date_6)]

这篇关于根据首次注册和最近值转换 NA 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆