从不同的列开始替换数据框中的NA值 [英] Replace NA values in dataframe starting in varying columns

查看:92
本文介绍了从不同的列开始替换数据框中的NA值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是NA主题的一种变体,我无法找到答案.我每月都按列观察,大量按行观察.有些缺失值是真实值,但有些应为零.我想将给定序列的缺失值替换为零,但前提是观察到该序列的值.

This is a variation on the NA theme that I have not been able to find an answer to. I have monthly observations by column with a large number of series by row. Some missing values are genuine but some should be zero. I want to replace missing values for a given series with zeros but only after a value for that series has been observed.

例如,给定:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1   NA
4    NA   NA   2   NA
5    2    2    NA   2

我要将其更改为:

   Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2    NA   3    2    1
3    NA   2    1    0
4    NA   NA   2    0
5    2    2    0    2

我想要类似locf函数的功能,该函数能够在第一个正观测值之前保留缺失值,但是我想用零填充而不是使用最后一个观测值.

I want something like the locf function, which is able to leave missing values prior to the first positive observation, but I want to fill with zeros rather than use the last observation.

推荐答案

这是另一种使用矩阵索引的基本R方法:

Here is another base R method using matrix indexing:

df[is.na(df) & t(apply(!is.na(df), 1, cummax))] <- 0
df
  Mth1 Mth2 Mth3 Mth4
1    1    2    1    3
2   NA    3    2    1
3   NA    2    1    0
4   NA   NA    2    0
5    2    2    0    2

is.na(df)返回指示NA值位置的逻辑矩阵. (在逻辑上)这链接到t(apply(!is.na(df), 1, cummax)),它指示在前一个行元素中是否出现非NA值.都为TRUE的data.frame元素替换为0.

is.na(df) returns a logical matrix indicating the location of NA values. This is (logically) chained to t(apply(!is.na(df), 1, cummax)) which indicates if a non-NA value occurred in a previous row element. elements of the data.frame for which both of these are TRUE are replaced with 0.

这篇关于从不同的列开始替换数据框中的NA值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆