如何用数据框架中的因素填充LOCF,并按国家/地区划分 [英] How to fill NAs with LOCF by factors in data frame, split by country
问题描述
我有以下数据框(简化),国家变量为因素,值变量缺少值:
I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:
country value
AUT NA
AUT 5
AUT NA
AUT NA
GER NA
GER NA
GER 7
GER NA
GER NA
以下生成上述数据框:
data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))
现在,我想使用最后一次观察结果(LOCF)替换每个国家/地区子集中的NA值。我知道动物园包中的命令 na.locf
。 data< - na.locf(data)
会给我以下数据框:
Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf
in the zoo package. data <- na.locf(data)
would give me the following data frame:
country value
AUT NA
AUT 5
AUT 5
AUT 5
GER 5
GER 5
GER 7
GER 7
GER 7
然而, 功能只能在国家/地区分开的个别子集上使用。以下是我需要的输出:
However, the function should only be used on the individual subsets split by the country. The following is the output I would need:
country value
AUT NA
AUT 5
AUT 5
AUT 5
GER NA
GER NA
GER 7
GER 7
GER 7
我不能想到一个简单的方法来实现它。在开始for循环之前,我想知道有没有人有任何想法来解决这个问题。
I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.
非常感谢!!
推荐答案
这是一个 ddply
解决方案。尝试这个
Here's a ddply
solution. Try this
library(plyr)
ddply(DF, .(country), na.locf)
country value
1 AUT <NA>
2 AUT 5
3 AUT 5
4 AUT 5
5 GER <NA>
6 GER <NA>
7 GER 7
8 GER 7
9 GER 7
编辑
从 ddply
帮助您找到
.variables: variables to split data frame by,
as quoted variables, a formula or character vector.
所以其他替代方案可以得到你想要的:
so another alternatives to get what you want are:
ddply(DF, "country", na.locf)
ddply(DF, ~country, na.locf)
请注意,用替换
不允许,这就是为什么在执行此操作时出现错误。 .variables
DF $ variable
note that replacing .variables
with DF$variable
is not allowed, that's why you got an error when doing this.
DF
是您的数据.frame
DF
is your data.frame
这篇关于如何用数据框架中的因素填充LOCF,并按国家/地区划分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!