R:组间插值 [英] R: Interpolation of NAs by group

查看:117
本文介绍了R:组间插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据帧的变量中执行线性插值,该变量考虑到:1)两点之间的时差,2)数据采集的时刻,3)个体用于测量



例如在下一个数据框中:

  df <  -  data.frame(time = c(1,2,3,4,5,6,7,1,2,3),
Individuals = c(1,1,1,1,1, 1,1,2,2,2),
Value = c(1,2,3,NA,5,NA,7,5,NA,7))
df

我想获取:

  result<  -  data.frame(time = c(1,2,3,4,5,6,7,1,2,3),
个人= c(1,1,1 ,1,1,1,1,2,2,2),
Value = c(1,2,3,4,5,6,7,5,5.5,6))
结果

我不能专门使用函数 na.approx 的包动物园,因为所有观察结果不是连续的,一些观察属于一个个体和其他观察vages属于其他。原因是因为如果第二个人第一个痴迷于 NA ,我将专门使用函数 na.approx ,我将使用来自个人的信息== 1 来插入 NA >个人== 2 (例如,下一个数据框会有sucherror)

  df_2 < data.frame(time = c(1,2,3,4,5,6,7,1,2,3),
个人= c(1,1,1,1,1,1,1 ,2,2,2),
Value = c(1,2,3,NA,5,NA,7,NA,5,7))
df_2

我尝试使用包 zoo dplyr

 库(dplyr)
库(zoo)
证明< - df%>%
group_by(个人)%>%
na.approx(df $ Value)

但是,我不能在 zoo 对象中执行 group_by



您是否知道如何按组插入一个变量中的 NA 值?



提前感谢

解决方案

使用 data.frame 而不是 cbind 创建您的数据。 cbind 返回一个矩阵,但是您需要一个 dplyr 的数据框。然后在 mutate 中使用 na.approx 。我已经注释掉了 group_by ,因为您没有在数据中提供分组变量,但是一旦将分组变量添加到数据框中,该方法应该可以工作。

  df<  -  data.frame(time = c(1,2,3,4,5,6,7 ,1,2,3),
个人= c(1,1,1,1,1,1,2,2,2),
Value = c(NA,2,3 ,NA,5,NA,7,8,NA,10))

库(dplyr)
库(zoo)

df%>%
group_by(个人)%>%
mutate(ValueInterp = na.approx(Value,na.rm = FALSE))

时间个人价值ValueInterp
1 1 1 NA NA
2 2 1 2 2
3 3 1 3 3
4 4 1 NA 4
5 5 1 5 5
6 6 1 NA 6
7 7 1 7 7
8 1 2 8 8
9 2 2 NA 9
10 3 2 10 10


I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.

For example in the next dataframe:

 df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
  df

I would like to obtain:

 result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                Individuals=c(1,1,1,1,1,1,1,2,2,2),
                Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
 result

I cannot use exclusively the function na.approx of the package zoo because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA and I would use exclusively the function na.approx, I would be using information from the individual==1 to interpolate the NA of the individual==2 (e.g the next data frame would have sucherror)

  df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                Individuals=c(1,1,1,1,1,1,1,2,2,2),
                Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
  df_2

I have tried using the packages zoo and dplyr:

library(dplyr)
library(zoo)
proof <- df %>%
  group_by(Individuals) %>%
  na.approx(df$Value)

But I cannot perform group_by in a zoo object.

Do you know how to interpolate NA values in one variable by groups?

Thanks in advance,

解决方案

Use data.frame, rather than cbind to create your data. cbind returns a matrix, but you need a data frame for dplyr. Then use na.approx inside mutate. I've commented out group_by, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))

library(dplyr)
library(zoo)

df %>%
  group_by(Individuals) %>%
  mutate(ValueInterp = na.approx(Value, na.rm=FALSE))

   time Individuals Value ValueInterp
1     1           1    NA          NA
2     2           1     2           2
3     3           1     3           3
4     4           1    NA           4
5     5           1     5           5
6     6           1    NA           6
7     7           1     7           7
8     1           2     8           8
9     2           2    NA           9
10    3           2    10          10

这篇关于R:组间插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆