用相邻列中的值替换列中的NA [英] Replace NA in column with value in adjacent column

查看:68
本文介绍了用相邻列中的值替换列中的NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与标题相似的帖子有关(在N向量中用相邻的值替换NA ).我想扫描数据框中的一列,并用相邻单元格中的值替换NA.在上述文章中,解决方案是不使用相邻向量(例如,数据矩阵中的相邻元素)的值替换NA,而是有条件地替换固定值.以下是我的问题的可重现示例:

This question is related to a post with a similar title (replace NA in an R vector with adjacent values). I would like to scan a column in a data frame and replace NA's with the value in the adjacent cell. In the aforementioned post, the solution was to replace the NA not with the value from the adjacent vector (e.g. the adjacent element in the data matrix) but was a conditional replace for a fixed value. Below is a reproducible example of my problem:

UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300)
STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE','ACTIVE','ACTIVE',
                    'ACTIVE','INACTIVE') 
TERMINATED <- c('1999-07-06' , '2008-12-05' , '2000-08-18' , '2000-08-18' ,'2000-08-18' ,'2008-08-18',
                        '2008-08-18','2006-09-19','2006-09-19' ,'2006-09-19' ,'1999-03-15') 
START <- c('2007-04-23','2008-12-06','2004-06-01','2007-02-01','2008-04-19','2010-11-29','2010-12-30',
                   '2007-10-29','2008-02-05','2008-06-30','2009-02-07')
STOP <- c('2008-12-05','4712-12-31','2007-01-31','2008-04-18','2010-11-28','2010-12-29','4712-12-31',
                  '2008-02-04','2008-06-29','2009-02-06','4712-12-31')
#creating dataframe
TEST <- data.frame(UNIT,STATUS,TERMINATED,START,STOP); TEST                   

  UNIT   STATUS TERMINATED      START       STOP
1    NA   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2    NA INACTIVE 2008-12-05 2008-12-06 4712-12-31
3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

#using the syntax for a conditional replace and hoping it works :/          
TEST$UNIT[is.na(TEST$UNIT)] <- TEST$STATUS; TEST 

   UNIT   STATUS TERMINATED      START       STOP
1     1   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2     2 INACTIVE 2008-12-05 2008-12-06 4712-12-31
3   200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4   200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5   200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6   200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7   200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8   300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9   300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10  300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11  300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

结果应为:

      UNIT   STATUS TERMINATED      START       STOP
1   ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
2 INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
3      200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
4      200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
5      200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
6      200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
7      200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
8      300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
9      300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
10     300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
11     300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

推荐答案

它不起作用,因为状态是一个因素.当将因子与数字混合时,数字的限制最少.通过强制将状态设置为字符,可以得到想要的结果,并且该列现在是字符向量:

It didn't work because status was a factor. When you mix factor with numeric then numeric is the least restrictive. By forcing status to be character you get the results you're after and the column is now a character vector:

TEST$UNIT[is.na(TEST$UNIT)] <- as.character(TEST$STATUS[is.na(TEST$UNIT)])

##        UNIT   STATUS TERMINATED      START       STOP
## 1    ACTIVE   ACTIVE 1999-07-06 2007-04-23 2008-12-05
## 2  INACTIVE INACTIVE 2008-12-05 2008-12-06 4712-12-31
## 3       200   ACTIVE 2000-08-18 2004-06-01 2007-01-31
## 4       200   ACTIVE 2000-08-18 2007-02-01 2008-04-18
## 5       200 INACTIVE 2000-08-18 2008-04-19 2010-11-28
## 6       200   ACTIVE 2008-08-18 2010-11-29 2010-12-29
## 7       200 INACTIVE 2008-08-18 2010-12-30 4712-12-31
## 8       300   ACTIVE 2006-09-19 2007-10-29 2008-02-04
## 9       300   ACTIVE 2006-09-19 2008-02-05 2008-06-29
## 10      300   ACTIVE 2006-09-19 2008-06-30 2009-02-06
## 11      300 INACTIVE 1999-03-15 2009-02-07 4712-12-31

这篇关于用相邻列中的值替换列中的NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆