从不同的列开始替换数据框中的NA值 [英] Replace NA values in dataframe starting in varying columns
问题描述
这是NA主题的一种变体,我无法找到答案.我每月都按列观察,大量按行观察.有些缺失值是真实值,但有些应为零.我想将给定序列的缺失值替换为零,但前提是观察到该序列的值.
This is a variation on the NA theme that I have not been able to find an answer to. I have monthly observations by column with a large number of series by row. Some missing values are genuine but some should be zero. I want to replace missing values for a given series with zeros but only after a value for that series has been observed.
例如,给定:
Mth1 Mth2 Mth3 Mth4
1 1 2 1 3
2 NA 3 2 1
3 NA 2 1 NA
4 NA NA 2 NA
5 2 2 NA 2
我要将其更改为:
Mth1 Mth2 Mth3 Mth4
1 1 2 1 3
2 NA 3 2 1
3 NA 2 1 0
4 NA NA 2 0
5 2 2 0 2
我想要类似locf
函数的功能,该函数能够在第一个正观测值之前保留缺失值,但是我想用零填充而不是使用最后一个观测值.
I want something like the locf
function, which is able to leave missing values prior to the first positive observation, but I want to fill with zeros rather than use the last observation.
推荐答案
这是另一种使用矩阵索引的基本R方法:
Here is another base R method using matrix indexing:
df[is.na(df) & t(apply(!is.na(df), 1, cummax))] <- 0
df
Mth1 Mth2 Mth3 Mth4
1 1 2 1 3
2 NA 3 2 1
3 NA 2 1 0
4 NA NA 2 0
5 2 2 0 2
is.na(df)
返回指示NA值位置的逻辑矩阵. (在逻辑上)这链接到t(apply(!is.na(df), 1, cummax))
,它指示在前一个行元素中是否出现非NA值.都为TRUE的data.frame元素替换为0.
is.na(df)
returns a logical matrix indicating the location of NA values. This is (logically) chained to t(apply(!is.na(df), 1, cummax))
which indicates if a non-NA value occurred in a previous row element. elements of the data.frame for which both of these are TRUE are replaced with 0.
这篇关于从不同的列开始替换数据框中的NA值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!