根据R中的多个条件进行插值 [英] Interpolate based on multiple conditions in r

查看:54
本文介绍了根据R中的多个条件进行插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

初学者r用户在这里.我有一个针对不同行业分类和不同分区的年度就业人数的数据集.对于某些观察,员工人数为空.我想通过线性插值(使用na.approx或其他方法)填充这些值.但是,我只想在相同的行业分类和子区域内进行插值.

Beginner r user here. I have a dataset of yearly employment numbers for different industry classifications and different subregions. For some observations, the number of employees is null. I would like to fill these values through linear interpolation (using na.approx or some other method). However, I only want to interpolate within the same industry classification and subregion.

例如,我有这个:

subregion <- c("East Bay", "East Bay", "East Bay", "East Bay", "East Bay", "South Bay")
industry <-c("A","A","A","A","A","B" )
year <- c(2013, 2014, 2015, 2016, 2017, 2002)
emp <- c(50, NA, NA, 80,NA, 300)

data <- data.frame(cbind(subregion,industry,year, emp))

  subregion industry year  emp
1  East Bay        A 2013   50
2  East Bay        A 2014 <NA>
3  East Bay        A 2015 <NA>
4  East Bay        A 2016   80
5  East Bay        A 2017 <NA>
6 South Bay        B 2002  300

我需要生成此表,跳过对第五个观察值的插值,因为子区域和行业与先前的观察值不匹配.

I need to generate this table, skipping interpolating the fifth observation because subregion and industry do not match the previous observation.

  subregion industry year  emp
1  East Bay        A 2013   50
2  East Bay        A 2014   60
3  East Bay        A 2015   70
4  East Bay        A 2016   80
5  East Bay        A 2017 <NA>
6 South Bay        B 2002  300

这样的文章很有帮助,但是我无法解决如何适应解决方案以匹配发生插值的两列相同而不是一列的要求.任何帮助将不胜感激.

Articles like this have been helpful, but I cannot figure out how to adapt the solution to match the requirement that two columns be the same for interpolation to occur, instead of one. Any help would be appreciated.

推荐答案

我们可以通过 na.approx (来自 zoo )进行分组

We could do a group by na.approx (from zoo)

library(tidyverse)
data %>% 
     group_by(subregion, industry) %>%
     mutate(emp = zoo::na.approx(emp, na.rm = FALSE))
# A tibble: 6 x 4
# Groups:   subregion, industry [2]
#  subregion industry  year   emp
#  <fct>     <fct>    <dbl> <dbl>
#1 East Bay  A         2013    50
#2 East Bay  A         2014    60
#3 East Bay  A         2015    70
#4 East Bay  A         2016    80
#5 East Bay  A         2017    NA
#6 South Bay B         2002   300

数据

data <- data.frame(subregion,industry,year, emp)

这篇关于根据R中的多个条件进行插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆