如何用组中以前的非NaN替换NaN值 [英] How to replace NaN value with previous non-NaN within group

查看:82
本文介绍了如何用组中以前的非NaN替换NaN值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用组中以前的非NaN值替换NaN值。

I need to replace the NaN values with the previous non-NaN value within the group.

这里是一个例子:

+-------+------------+-------+
| ts_id |    date    | value |
+-------+------------+-------+
|     2 | 01/10/2014 | 18    |
|     2 | 01/11/2014 | 15    |
|     2 | 01/12/2014 | NaN   |
|     2 | 01/01/2015 | NaN   |
|     2 | 01/02/2015 | NaN   |
|     3 | 01/03/2015 | 19    |
|     3 | 01/04/2015 | 20    |
|     3 | 01/10/2015 | 12    |
|     3 | 01/11/2015 | 17    |
|     3 | 01/12/2015 | NaN   |
|     3 | 01/01/2016 | NaN   |
|     3 | 01/08/2016 | 7     |
|     3 | 01/09/2016 | NaN   |
|     3 | 01/10/2016 | NaN   |
|     3 | 01/11/2016 | NaN   |
|     3 | 01/12/2016 | NaN   |
|     3 | 01/01/2017 | NaN   |
+-------+------------+-------+

数据:

data <- structure(list(ts_id = c(2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 
                         3, 3, 3, 3, 3), date = structure(c(16344, 16375, 16405, 16436, 
                                                            16467, 16495, 16526, 16709, 16740, 16770, 16801, 17014, 17045, 
                                                            17075, 17106, 17136, 17167), class = "Date"), value = c(18, 15, 
                                                                                                                    NaN, NaN, NaN, 19, 20, 12, 17, NaN, NaN, 7, NaN, NaN, NaN, NaN, 
                                                                                                                    NaN)), row.names = c(NA, -17L), vars = "ts_id", drop = TRUE, indices = list(
                                                                                                                      0:16), group_sizes = 17L, biggest_group_size = 17L, labels = structure(list(
                                                                                                                        ts_id = 3L), row.names = c(NA, -1L), class = "data.frame", vars = "ts_id", drop = TRUE), class = "data.frame")

在每个组中( (由ts_id标识),我可以在任何给定日期使用NaN值。我需要用最新的非NaN值替换每个NaN。

Within each group (identified by ts_id), I can have NaN values at any given date. I need to replace each NaN with the most recent non-NaN value.

结果应如下所示:

+-------+------------+-------+
| ts_id |    date    | value |
+-------+------------+-------+
|     2 | 01/10/2014 |    18 |
|     2 | 01/11/2014 |    15 |
|     2 | 01/12/2014 |    15 |
|     2 | 01/01/2015 |    15 |
|     2 | 01/02/2015 |    15 |
|     3 | 01/03/2015 |    19 |
|     3 | 01/04/2015 |    20 |
|     3 | 01/10/2015 |    12 |
|     3 | 01/11/2015 |    17 |
|     3 | 01/12/2015 |    17 |
|     3 | 01/01/2016 |    17 |
|     3 | 01/08/2016 |     7 |
|     3 | 01/09/2016 |     7 |
|     3 | 01/10/2016 |     7 |
|     3 | 01/11/2016 |     7 |
|     3 | 01/12/2016 |     7 |
|     3 | 01/01/2017 |     7 |
+-------+------------+-------+

预先感谢。

推荐答案

您可以使用:

library(dplyr)
library(zoo) # for the na.locf function
data %>% 
  group_by(ts_id) %>% # group by id
  mutate(value = na.locf(value,na.rm=F)) # na.locf fills with the last non-empty value

#head()
# # A tibble: 6 x 3
# # Groups:   ts_id [2]
# ts_id date       value
# <dbl> <date>     <dbl>
# 1     2 2014-10-01    18
# 2     2 2014-11-01    15
# 3     2 2014-12-01    15
# 4     2 2015-01-01    15
# 5     2 2015-02-01    15
# 6     3 2015-03-01    19

这篇关于如何用组中以前的非NaN替换NaN值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆