使用dplyr窗口函数来制作尾值(填入NA值) [英] Using dplyr window-functions to make trailing values (fill in NA values)
问题描述
我想解决dplyr的以下问题。最适合使用窗口功能之一。
我有一个数据框架与房子和购买价格。以下是一个例子:
houseID年价格
/ pre>
1 1995 NA
1 1996 100
1 1997年NA
1 1998 120
1 1999 NA
2 1995 NA
2 1996 NA
2 1997 NA
2 1998 30
2 1999 NA
3 1995 NA
3 1996 44
3 1997 NA
3 1998 NA
3 1999 NA
我想制作一个这样的数据框:
houseID年价格
1 1995 NA
1 1996 100
1 1997 100
1 1998 120
1 1999 120
2 1995 NA
2 1996年NA
2 1997年NA
2 1998 30
2 1999 30
3 1995 NA
3 199 6 44
3 1997 44
3 1998 44
3 1999 44
以下是一些正确格式的数据:
#房屋数量
N = 15
#数据帧
df = data.frame(houseID = rep(1:N,each = 10),year = 1995:2004,price = ifelse(runif(10 * N)> 0.15, NA,exp(rnorm(10 * N))))
有没有办法这些都是从动物园包中使用
na.locf
解决方案:
dplyr
dplyr)
库(zoo)
df%>%group_by(houseID)%>%na.locf%>%ungroup
给出:
来源:本地数据框[ 15 x 3]
组:houseID
houseID年价
1 1 1995 NA
2 1 1996 100
3 1 1997 100
4 1 1998 120
5 1 1999 120
6 2 1995年
7 2 1996 NA
8 2 1997 NA
9 2 1998 30
10 2 1999 30
11 3 1995 NA
12 3 1996 44
13 3 1997 44
14 3 1998 44
15 3 1999 44
下面的其他解决方案给出的输出非常相似,所以我们不会重复,除非格式差异很大。
另一个可能性是将解决方案(如下面进一步显示)与dplyr相结合:
df%>%by(df $ houseID,na.locf)%>%rbind_all
图书馆(动物园)
do.call(rbind ,by(df,df $ houseID,na.locf))
ave
library(zoo)
na.locf2< - function(x)na.locf (x,na.rm = FALSE)
transform(df,price = ave(price,houseID,FUN = na。 locf2))
data.table
library(data.table)
库(zoo)
data.table(df)[,na.locf .SD),by = houseID]
动物园 。它返回一个广泛而不是很长的结果:
library(zoo)
z < - read。 zoo(df,index = 2,split = 1,FUN = identity)
na.locf(z,na.rm = FALSE)
给出:
1 2 3
1995 NA NA NA
1996 100 NA 44
1997 100 NA 44
1998 120 30 44
1999 120 30 44
此解决方案可以与dplyr相结合:
library(dplyr)
library(zoo)
df%>%read.zoo(index = 2,split = 1,FUN = identity)%>%na.locf(na.rm = FALSE)
输入
以下是上述示例的输入:
df< - structure(list(houseID = c(1L,1L, 1L,1L,1L,2L,2L,2L,2L,
2L,3L,3L,3L,3L,3L),year = c(1995L,1996L,1997L,1998L,
1999L, ,1996L,1997L,1998L,1999L,1995L,1996L,1997L,
1998L,1999L),p大米= c(NA,100L,NA,120L,NA,NA,NA,NA,
30L,NA,NA,44L,NA,NA,NA)).Names = c(houseID年,
price),class =data.frame,row.names = c(NA,-15L))
REVISED 重新安排并添加更多解决方案。修改dplyr / zoo解决方案以符合最新的更改dplyr。
I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an example:
houseID year price 1 1995 NA 1 1996 100 1 1997 NA 1 1998 120 1 1999 NA 2 1995 NA 2 1996 NA 2 1997 NA 2 1998 30 2 1999 NA 3 1995 NA 3 1996 44 3 1997 NA 3 1998 NA 3 1999 NA
I would like to make a data frame like this:
houseID year price 1 1995 NA 1 1996 100 1 1997 100 1 1998 120 1 1999 120 2 1995 NA 2 1996 NA 2 1997 NA 2 1998 30 2 1999 30 3 1995 NA 3 1996 44 3 1997 44 3 1998 44 3 1999 44
Here are some data in the right format:
# Number of houses N = 15 # Data frame df = data.frame(houseID = rep(1:N,each=10), year=1995:2004, price =ifelse(runif(10*N)>0.15, NA,exp(rnorm(10*N))))
Is there a dplyr-way to do that?
解决方案These all use
na.locf
from the zoo package:dplyr
library(dplyr) library(zoo) df %>% group_by(houseID) %>% na.locf %>% ungroup
giving:
Source: local data frame [15 x 3] Groups: houseID houseID year price 1 1 1995 NA 2 1 1996 100 3 1 1997 100 4 1 1998 120 5 1 1999 120 6 2 1995 NA 7 2 1996 NA 8 2 1997 NA 9 2 1998 30 10 2 1999 30 11 3 1995 NA 12 3 1996 44 13 3 1997 44 14 3 1998 44 15 3 1999 44
Other solutions below give output which is quite similar so we won't repeat it except where the format differs substantially.
Another possibility is to combine the
by
solution (shown further below) with dplyr:df %>% by(df$houseID, na.locf) %>% rbind_all
by
library(zoo) do.call(rbind, by(df, df$houseID, na.locf))
ave
library(zoo) na.locf2 <- function(x) na.locf(x, na.rm = FALSE) transform(df, price = ave(price, houseID, FUN = na.locf2))
data.table
library(data.table) library(zoo) data.table(df)[, na.locf(.SD), by = houseID]
zoo This solution uses zoo alone. It returns a wide rather than long result:
library(zoo) z <- read.zoo(df, index = 2, split = 1, FUN = identity) na.locf(z, na.rm = FALSE)
giving:
1 2 3 1995 NA NA NA 1996 100 NA 44 1997 100 NA 44 1998 120 30 44 1999 120 30 44
This solution could be combined with dplyr like this:
library(dplyr) library(zoo) df %>% read.zoo(index = 2, split = 1, FUN = identity) %>% na.locf(na.rm = FALSE)
input
Here is the input used for the examples above:
df <- structure(list(houseID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), year = c(1995L, 1996L, 1997L, 1998L, 1999L, 1995L, 1996L, 1997L, 1998L, 1999L, 1995L, 1996L, 1997L, 1998L, 1999L), price = c(NA, 100L, NA, 120L, NA, NA, NA, NA, 30L, NA, NA, 44L, NA, NA, NA)), .Names = c("houseID", "year", "price"), class = "data.frame", row.names = c(NA, -15L))
REVISED Re-arranged and added more solutions. Revised dplyr/zoo solution to conform to latest changes dplyr.
这篇关于使用dplyr窗口函数来制作尾值(填入NA值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文