在数据框的子集上插值变量 [英] Interpolate variables on subsets of dataframe

查看：69 发布时间：2020/5/28 20:27:42 r plyr

本文介绍了在数据框的子集上插值变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的数据框，其中包含来自多个州的调查长达数年的观察结果.这是数据结构:

I have a large dataframe which has observations from surveys from multiple states for several years. Here's the data structure:

state | survey.year | time1 | obs1 | time2 | obs2
CA    | 2000        | 1     | 23   | 1.2   | 43
CA    | 2001        | 2     | 43   | 1.4   | 52
CA    | 2002        | 5     | 53   | 3.2   | 61
...
CA    | 1998        | 3     | 12   | 2.3   | 20
CA    | 1999        | 4     | 14   | 2.8   | 25
CA    | 2003        | 5     | 19   | 4.3   | 29
...
ND    | 2000        | 2     | 223   | 3.2   | 239
ND    | 2001        | 4     | 233   | 4.2   | 321
ND    | 2003        | 7     | 256   | 7.9   | 387

对于每个州/调查年组合，我想对obs2进行插值，以使其时间位置与(time1，obs1)对齐.

For each state/survey.year combination, I would like to interpolate obs2 so that it's time-location is lined up with (time1,obs1).

ie，我想将数据框分解为state/survey.year块，执行线性插值，然后将各个state/survey.year数据框缝合在一起，成为一个主数据框.

ie I would like to break up the dataframe into state/survey.year chunks, perform my linear interpolation, and then stitch the individual state/survey.year dataframes back together into a master dataframe.

我一直在尝试找出如何使用plyr和Hmisc软件包.但是要纠缠不清.

I have been trying to figure out how to use the plyr and Hmisc packages for this. But keeping getting myself in a tangle.

这是我编写的用于进行插值的代码:

Here's the code that I wrote to do the interpolation:

require(Hmisc)
df <- new.obs2 <- NULL
for (i in 1:(0.5*(ncol(indirect)-1))){
 df[,"new.obs2"] <-   approxExtrap(df[,"time1"],
                                     df[,"obs1"],
                                     xout = df[,"obs2"],
                                     method="linear",
                                     rule=2)
}

但是我不确定如何在这个问题上发挥作用.您的慷慨建议将不胜感激.本质上，我只是想在每个state/survey.year组合中内插"obs2"，因此时间参考与"obs1"一致.

But I am not sure how to unleash plyr on this problem. Your generous advice and suggestions would be much appreciated. Essentially - I am just trying to interpolate "obs2", within each state/survey.year combination, so it's time references line up with those of "obs1".

当然，如果有一种巧妙的方法可以在不调用plyr函数的情况下执行此操作，那么我将对此持开放态度...

Of course if there's a slick way to do this without invoking plyr functions, then I'd be open to that...

谢谢！

推荐答案

这应该很简单，

ddply(df,.(state,survey.year),transform,
                              new.obs2 = approxExtrap(time1,obs1,xout = obs2,
                                                      method = "linear",
                                                      rule = 2))

但是我不能向您保证任何事情，因为我还没有一个最清楚的主意，您的for循环的意义是什么. (每次循环都会覆盖df[,"new.obs2"]?您将整个数据帧df初始化为NULL?什么是indirect?")

But I can't promise you anything, since I haven't the foggiest idea what the point of your for loop is. (It's overwriting df[,"new.obs2"] each time through the loop? You initialize the entire data frame df to NULL? What's indirect?)

这篇关于在数据框的子集上插值变量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在数据框的子集上插值变量 [英] Interpolate variables on subsets of dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在数据框的子集上插值变量 [英] Interpolate variables on subsets of dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭