R:按组插入 NA [英] R: Interpolation of NAs by group

查看:28
本文介绍了R:按组插入 NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据帧的变量中执行线性插值,其中考虑到:1) 两点之间的时间差,2) 获取数据的时刻以及 3) 被测量的个体变量.

例如在下一个数据帧中:

 df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),个人=c(1,1,1,1,1,1,1,2,2,2),值=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))df

我想获得:

 结果 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),个人=c(1,1,1,1,1,1,1,2,2,2),值=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))结果

我不能只使用 zoo 包的函数 na.approx 因为所有的观察都不是连续的,有些观察属于一个个体,其他观察属于其他个体.原因是因为如果第二个人对 NA 进行第一次观察,而我将专门使用函数 na.approx,我将使用来自 的信息individual==1 插入individual==2NA(例如下一个数据帧会有这样的错误)

 df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),个人=c(1,1,1,1,1,1,1,2,2,2),值=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))df_2

我已经尝试使用包 zoodplyr:

库(dplyr)图书馆(动物园)证明 <-df%>%group_by(个人)%>%na.approx(df$价值)

但我无法在 zoo 对象中执行 group_by.

您知道如何按组在一个变量中插入 NA 值吗?

提前致谢,

解决方案

使用 data.frame 而不是 cbind 来创建数据.cbind 返回一个矩阵,但您需要 dplyr 的数据框.然后在 mutate 中使用 na.approx.我已经注释掉了 group_by,因为您没有在数据中提供分组变量,但是一旦您将分组变量添加到数据框中,该方法应该可以工作.

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),个人=c(1,1,1,1,1,1,1,2,2,2),值=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))图书馆(dplyr)图书馆(动物园)df%>%group_by(个人)%>%变异(ValueInterp = na.approx(Value, na.rm=FALSE))

<块引用>

 time Individuals Value ValueInterp1 1 1 不适用 不适用2 2 1 2 23 3 1 3 34 4 1 不适用 45 5 1 5 56 6 1 不适用 67 7 1 7 78 1 2 8 89 2 2 不适用 910 3 2 10 10

更新:要插入多个列,我们可以使用 mutate_at.这是一个包含两个值列的示例.我们使用 mutate_at 在列名称中包含 "Value" 的所有列上运行 na.approx.list(interp=na.approx) 告诉 mutate_at 通过运行 na.approx 并添加 interp 来生成新的列名> 作为生成新列名的后缀:

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),个人=c(1,1,1,1,1,1,1,2,2,2),值1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),值2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)df%>%group_by(个人)%>%mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)

<块引用>

 time Individuals Value1 Value2 Value1_interp Value2_interp<dbl><dbl><dbl><dbl><dbl><dbl>1 1 1 NA NA NA NA2 2 1 2 4 2 43 3 1 3 6 3 64 4 1 不适用 不适用 4 85 5 1 5 10 5 106 6 1 不适用 不适用 6 127 7 1 7 14 7 148 1 2 8 16 8 169 2 2 不适用 不适用 9 1810 3 2 10 20 10 20

如果您不想保留原始的、未插值的列,您可以这样做:

df %>%group_by(个人)%>%mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)

I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.

For example in the next dataframe:

 df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
  df

I would like to obtain:

 result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                Individuals=c(1,1,1,1,1,1,1,2,2,2),
                Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
 result

I cannot use exclusively the function na.approx of the package zoo because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA and I would use exclusively the function na.approx, I would be using information from the individual==1 to interpolate the NA of the individual==2 (e.g the next data frame would have sucherror)

  df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                Individuals=c(1,1,1,1,1,1,1,2,2,2),
                Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
  df_2

I have tried using the packages zoo and dplyr:

library(dplyr)
library(zoo)
proof <- df %>%
  group_by(Individuals) %>%
  na.approx(df$Value)

But I cannot perform group_by in a zoo object.

Do you know how to interpolate NA values in one variable by groups?

Thanks in advance,

解决方案

Use data.frame, rather than cbind to create your data. cbind returns a matrix, but you need a data frame for dplyr. Then use na.approx inside mutate. I've commented out group_by, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))

library(dplyr)
library(zoo)

df %>%
  group_by(Individuals) %>%
  mutate(ValueInterp = na.approx(Value, na.rm=FALSE))    

   time Individuals Value ValueInterp
1     1           1    NA          NA
2     2           1     2           2
3     3           1     3           3
4     4           1    NA           4
5     5           1     5           5
6     6           1    NA           6
7     7           1     7           7
8     1           2     8           8
9     2           2    NA           9
10    3           2    10          10

Update: To interpolate multiple columns, we can use mutate_at. Here's an example with two value columns. We use mutate_at to run na.approx on all columns that include "Value" in the column name. list(interp=na.approx) tells mutate_at to generate new column names by running na.approx and adding interp as a suffix to generate the new column names:

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                 Individuals=c(1,1,1,1,1,1,1,2,2,2),
                 Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
                 Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)

df %>%
  group_by(Individuals) %>%
  mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)

    time Individuals Value1 Value2 Value1_interp Value2_interp
   <dbl>       <dbl>  <dbl>  <dbl>         <dbl>         <dbl>
 1     1           1     NA     NA            NA            NA
 2     2           1      2      4             2             4
 3     3           1      3      6             3             6
 4     4           1     NA     NA             4             8
 5     5           1      5     10             5            10
 6     6           1     NA     NA             6            12
 7     7           1      7     14             7            14
 8     1           2      8     16             8            16
 9     2           2     NA     NA             9            18
10     3           2     10     20            10            20

If you don't want to preserve the original, uninterpolated columns, you can do:

df %>%
  group_by(Individuals) %>%
  mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)

这篇关于R:按组插入 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆