如何用数据框架中的因素填充LOCF,并按国家/地区划分 [英] How to fill NAs with LOCF by factors in data frame, split by country

查看:117
本文介绍了如何用数据框架中的因素填充LOCF,并按国家/地区划分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框(简化),国家变量为因素,值变量缺少值:

I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:

country value
AUT     NA
AUT     5
AUT     NA
AUT     NA
GER     NA
GER     NA
GER     7
GER     NA
GER     NA

以下生成上述数据框:

data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))

现在,我想使用最后一次观察结果(LOCF)替换每个国家/地区子集中的NA值。我知道动物园包中的命令 na.locf data< - na.locf(data)会给我以下数据框:

Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf in the zoo package. data <- na.locf(data) would give me the following data frame:

country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     5
GER     5
GER     7
GER     7
GER     7

然而, 功能只能在国家/地区分开的个别子集上使用。以下是我需要的输出:

However, the function should only be used on the individual subsets split by the country. The following is the output I would need:

country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     NA
GER     NA
GER     7
GER     7
GER     7

我不能想到一个简单的方法来实现它。在开始for循环之前,我想知道有没有人有任何想法来解决这个问题。

I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.

非常感谢!!

推荐答案

这是一个 ddply 解决方案。尝试这个

Here's a ddply solution. Try this

library(plyr)
ddply(DF, .(country), na.locf)
  country value
1     AUT  <NA>
2     AUT     5
3     AUT     5
4     AUT     5
5     GER  <NA>
6     GER  <NA>
7     GER     7
8     GER     7
9     GER     7

编辑
ddply 帮助您找到

.variables:  variables to split data frame by, 
as quoted variables, a formula or character vector.

所以其他替代方案可以得到你想要的:

so another alternatives to get what you want are:

ddply(DF, "country", na.locf)
ddply(DF, ~country, na.locf)

请注意,用替换 .variables DF $ variable 不允许,这就是为什么在执行此操作时出现错误。

note that replacing .variables with DF$variable is not allowed, that's why you got an error when doing this.

DF 是您的数据.frame

DF is your data.frame

这篇关于如何用数据框架中的因素填充LOCF,并按国家/地区划分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆