使用 dplyr 进行线性插值 [英] Linear Interpolation using dplyr

查看:29
本文介绍了使用 dplyr 进行线性插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 zoo 库中的 na.approx() 函数(与 xts 结合使用)来插入缺失值多次测量的多个个体的重复测量数据中的值.

示例数据...

event.date <- c(2010-05-25"、2010-09-10"、2011-05-13"、2012-03-28"、2013-03-07",2014-02-13"、2010-06-11"、2010-09-10"、2011-05-13"、2012-03-28"、2013-03-07"、2014-02-13")变量 <-c("neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd",wbody.bmd"、wbody.bmd"、wbody.bmd"、wbody.bmd"、wbody.bmd"、wbody.bmd")值 <- c(0.7490, 0.7615, 0.7900, 0.7730, NA, 0.7420, 1.0520, 1.0665, 1.0760,1.0870,北美,1.0550)## 绑定到数据框df <- data.frame(event.date, variable, value)rm(事件日期,变量,值)## 转换日期df$event.date <- as.Date(df$event.date)## 加载库图书馆(magrittr)图书馆(xts)图书馆(动物园)

我可以使用 xts()na.approx() 为给定的人的单个结果插入一个缺失的数据点....

## 子集一个变量wbody <-subset(df, variable == "wbody.bmd")## order/index 然后插值xts(wbody$value, wbody$event.date)%>%na.approx()2010-06-11 1.0520002010-09-10 1.0665002011-05-13 1.0760002012-03-28 1.0870002013-03-07 1.0709772014-02-13 1.055000

返回矩阵并不理想,但我可以解决这个问题.不过,我遇到的主要问题是我对多人有多种结果.我,也许天真地认为,既然这是一个拆分-应用-组合问题,我可以利用 dplyr 以下列方式实现这一点...

## 加载库图书馆(dplyr)##分组然后排列数据(确保日期正确)df%>%group_by(变量)%>%安排(变量,事件.日期)%>%xts(.$value, .$event.date) %>%na.approx()

<块引用>

xts(., .$value, .$event.date) 中的错误:order.by 需要一个合适的基于时间的对象

似乎 dplyr 不能很好地与 xts/zoo 配合使用,我花了几个小时四处寻找找到有关如何在 R 中插入缺失数据点的教程/示例,但我发现的只是单个案例示例,到目前为止,我一直无法找到有关如何为多人的多个站点执行此操作的任何内容(我意识到我可以通过将我的数据重塑为广泛的方式使其成为多人问题,但这仍然无法解决我遇到的问题).

任何关于如何进行的想法/建议/见解将不胜感激.

谢谢

澄清一些函数来自 zoo 包.

解决方案

我采用的解决方案基于@docendodiscimus 的第一条评论

我一直在做这种方法,而不是尝试创建一个新的数据框,只是通过利用 dplyrmutate()函数.

我的代码现在...

df %>%group_by(变量)%>%安排(变量,事件.日期)%>%变异(ip.value = na.approx(value, maxgap = 4, rule = 2))

maxgap 允许最多四个连续的 NA,而 rule 选项允许外推到侧翼时间点.

I'm trying to use the na.approx() function from the zoo library (in conjunction with xts) to interpolate missing values from repeated measures data for multiple individuals with multiple measurements.

Sample data...

event.date <- c("2010-05-25", "2010-09-10", "2011-05-13", "2012-03-28", "2013-03-07",    
                "2014-02-13", "2010-06-11", "2010-09-10", "2011-05-13", "2012-03-28",
                "2013-03-07", "2014-02-13")
variable   <- c("neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd",
                "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd")
value      <- c(0.7490, 0.7615, 0.7900, 0.7730, NA, 0.7420, 1.0520, 1.0665, 1.0760,
                1.0870, NA, 1.0550)
## Bind into a data frame
df <- data.frame(event.date, variable, value)
rm(event.date, variable, value)
## Convert date
df$event.date <- as.Date(df$event.date)
## Load libraries
library(magrittr)
library(xts)
library(zoo)

I can interpolate one missing data point for a single outcome for a given person using xts() and na.approx()....

## Subset one variable
wbody <- subset(df, variable == "wbody.bmd")
## order/index and then interpolate
xts(wbody$value, wbody$event.date) %>%
  na.approx()
2010-06-11 1.052000
2010-09-10 1.066500
2011-05-13 1.076000
2012-03-28 1.087000
2013-03-07 1.070977
2014-02-13 1.055000

Not ideal having a matrix returned, but I can work around that. The main problem I have though is that I've multiple outcomes for multiple people. I, perhaps naively thought that since this is therefore a split-apply-combine problem that I could utilise dplyr to achieve this in the following manner...

## Load library
library(dplyr)
## group and then arrange the data (to ensure dates are correct)
df %>%
  group_by(variable) %>%
    arrange(variable, event.date) %>%
      xts(.$value, .$event.date) %>%
        na.approx()

Error in xts(., .$value, .$event.date) : order.by requires an appropriate time-based object

It seems that dplyr doesn't play well with xts/zoo and I've spent a couple of hours searching around trying to find tutorials/examples on how to interpolate missing data points in R, but all I've found are single case examples and so far I've been unable to find anything on how to do this for multiple sites for multiple people (I realise I could make it just a multiple people problem by reshaping my data to wide but that still wouldn't solve the problem I'm encountering).

Any thoughts/advice/insights on how to proceed would be greatly appreciated.

Thanks

EDIT : Clarification that some functions come from zoo package.

解决方案

The solution I've gone with is based on the first comment from @docendodiscimus

Rather than attempt to create a new data frame as I'd been doing this approach simply adds columns to the existing data frame by taking advantage of dplyr's mutate() function.

My code is now...

df %>%
  group_by(variable) %>%
    arrange(variable, event.date) %>%
      mutate(ip.value = na.approx(value, maxgap = 4, rule = 2))

The maxgap allows upto four consecutive NA's, whilst the rule option allows extrapolation into the flanking time points.

这篇关于使用 dplyr 进行线性插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆