使用dplyr进行线性插值,但跳过所有缺失值的组 [英] linear interpolation with dplyr but skipping groups with all missing values

查看:129
本文介绍了使用dplyr进行线性插值,但跳过所有缺失值的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用dplyr和rox()对组中的值进行线性插值,不幸的是,某些组中的所有值均缺失,因此我希望近似值可以跳过这些组并继续进行其余的操作。我不想外推或使用最近的邻近观测数据。

I'm trying to linearly interpolate values within a group using dplyr and approx() Unfortunately, some of the groups have all missing values, so I'd like the approximation to just skip those groups and proceed for the remainder. I don't want to extrapolate or using the nearest neighbouring observation's data.

这里是数据示例。第一组(按ID)全部丢失,另一组应插值。

Here's an example of the data. The first group (by id) has all missing, the other should be interpolated.

data <- read.csv(text="
id,year,value
c1,1998,NA
c1,1999,NA
c1,2000,NA
c1,2001,NA
c2,1998,14
c2,1999,NA
c2,2000,NA
c2,2001,18")

dataIpol <- data %>%
group_by(id) %>% 
arrange(id, year) %>%            
mutate(valueIpol = approx(year, value, year, 
                 method = "linear", rule = 1, f = 0, ties = mean)$y)

但是我得到了错误


错误:需要至少两个非NA值进行插值

Error: need at least two non-NA values to interpolate

我不如果我摆脱了所有缺少的组,那将无法得到此错误。

I don't get this error if I get rid of the groups that have all missing but that's not feasible.

推荐答案

我们可以通过以下方法解决此问题:添加具有所需数据点数量的 filter 步骤:

We can fix this by adding a filter step with the required number of data points:

library(dplyr)
dataIpol <- data %>%
  group_by(id) %>% 
  arrange(id, year) %>%
  filter(sum(!is.na(value))>=2) %>% #filter!
  mutate(valueIpol = approx(year, value, year, 
                            method = "linear", rule = 1, f = 0, ties = mean)$y)

在这里,我们将value列中的非NA项目的数量相加,并删除所有不具有> ; = 2

Here we sum the number of non-NA items in the value column, and remove any groups that do not have >=2.

这篇关于使用dplyr进行线性插值,但跳过所有缺失值的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆