使用dplyr窗口函数来制作尾值(填入NA值) [英] Using dplyr window-functions to make trailing values (fill in NA values)

查看:149
本文介绍了使用dplyr窗口函数来制作尾值(填入NA值)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解决dplyr的以下问题。最适合使用窗口功能之一。
我有一个数据框架与房子和购买价格。以下是一个例子:

  houseID年价格
1 1995 NA
1 1996 100
1 1997年NA
1 1998 120
1 1999 NA
2 1995 NA
2 1996 NA
2 1997 NA
2 1998 30
2 1999 NA
3 1995 NA
3 1996 44
3 1997 NA
3 1998 NA
3 1999 NA
/ pre>

我想制作一个这样的数据框:

  houseID年价格
1 1995 NA
1 1996 100
1 1997 100
1 1998 120
1 1999 120
2 1995 NA
2 1996年NA
2 1997年NA
2 1998 30
2 1999 30
3 1995 NA
3 199 6 44
3 1997 44
3 1998 44
3 1999 44

以下是一些正确格式的数据:

 #房屋数量
N = 15

#数据帧
df = data.frame(houseID = rep(1:N,each = 10),year = 1995:2004,price = ifelse(runif(10 * N)> 0.15, NA,exp(rnorm(10 * N))))

有没有办法这些都是从动物园包中使用 na.locf

解决方案:



dplyr

  dplyr)
库(zoo)

df%>%group_by(houseID)%>%na.locf%>%ungroup

给出:

 来源:本地数据框[ 15 x 3] 
组:houseID

houseID年价
1 1 1995 NA
2 1 1996 100
3 1 1997 100
4 1 1998 120
5 1 1999 120
6 2 1995年
7 2 1996 NA
8 2 1997 NA
9 2 1998 30
10 2 1999 30
11 3 1995 NA
12 3 1996 44
13 3 1997 44
14 3 1998 44
15 3 1999 44

下面的其他解决方案给出的输出非常相似,所以我们不会重复,除非格式差异很大。



另一个可能性是将解决方案(如下面进一步显示)与dplyr相结合:

  df%>%by(df $ houseID,na.locf)%>%rbind_all 

 图书馆(动物园)

do.call(rbind ,by(df,df $ houseID,na.locf))

ave

  library(zoo)

na.locf2< - function(x)na.locf (x,na.rm = FALSE)
transform(df,price = ave(price,houseID,FUN = na。 locf2))

data.table

  library(data.table)
库(zoo)

data.table(df)[,na.locf .SD),by = houseID]

动物园 。它返回一个广泛而不是很长的结果:

  library(zoo)

z < - read。 zoo(df,index = 2,split = 1,FUN = identity)
na.locf(z,na.rm = FALSE)

给出:

  1 2 3 
1995 NA NA NA
1996 100 NA 44
1997 100 NA 44
1998 120 30 44
1999 120 30 44

此解决方案可以与dplyr相结合:

  library(dplyr)
library(zoo)

df%>%read.zoo(index = 2,split = 1,FUN = identity)%>%na.locf(na.rm = FALSE)

输入



以下是上述示例的输入:

  df<  -  structure(list(houseID = c(1L,1L, 1L,1L,1L,2L,2L,2L,2L,
2L,3L,3L,3L,3L,3L),year = c(1995L,1996L,1997L,1998L,
1999L, ,1996L,1997L,1998L,1999L,1995L,1996L,1997L,
1998L,1999L),p大米= c(NA,100L,NA,120L,NA,NA,NA,NA,
30L,NA,NA,44L,NA,NA,NA)).Names = c(houseID年,
price),class =data.frame,row.names = c(NA,-15L))

REVISED 重新安排并添加更多解决方案。修改dplyr / zoo解决方案以符合最新的更改dplyr。


I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an example:

houseID      year    price 
1            1995    NA
1            1996    100
1            1997    NA
1            1998    120
1            1999    NA
2            1995    NA
2            1996    NA
2            1997    NA
2            1998    30
2            1999    NA
3            1995    NA
3            1996    44
3            1997    NA
3            1998    NA
3            1999    NA

I would like to make a data frame like this:

houseID      year    price 
1            1995    NA
1            1996    100
1            1997    100
1            1998    120
1            1999    120
2            1995    NA
2            1996    NA
2            1997    NA
2            1998    30
2            1999    30
3            1995    NA
3            1996    44
3            1997    44
3            1998    44
3            1999    44

Here are some data in the right format:

# Number of houses
N = 15

# Data frame
df = data.frame(houseID = rep(1:N,each=10), year=1995:2004, price =ifelse(runif(10*N)>0.15, NA,exp(rnorm(10*N))))

Is there a dplyr-way to do that?

解决方案

These all use na.locf from the zoo package:

dplyr

library(dplyr)
library(zoo)

df %>% group_by(houseID) %>% na.locf %>% ungroup

giving:

Source: local data frame [15 x 3]
Groups: houseID

   houseID year price
1        1 1995    NA
2        1 1996   100
3        1 1997   100
4        1 1998   120
5        1 1999   120
6        2 1995    NA
7        2 1996    NA
8        2 1997    NA
9        2 1998    30
10       2 1999    30
11       3 1995    NA
12       3 1996    44
13       3 1997    44
14       3 1998    44
15       3 1999    44

Other solutions below give output which is quite similar so we won't repeat it except where the format differs substantially.

Another possibility is to combine the by solution (shown further below) with dplyr:

df %>% by(df$houseID, na.locf) %>% rbind_all

by

library(zoo)

do.call(rbind, by(df, df$houseID, na.locf))

ave

library(zoo)

na.locf2 <- function(x) na.locf(x, na.rm = FALSE)
transform(df, price = ave(price, houseID, FUN = na.locf2))

data.table

library(data.table)
library(zoo)

data.table(df)[, na.locf(.SD), by = houseID]

zoo This solution uses zoo alone. It returns a wide rather than long result:

library(zoo)

z <- read.zoo(df, index = 2, split = 1, FUN = identity)
na.locf(z, na.rm = FALSE)

giving:

       1  2  3
1995  NA NA NA
1996 100 NA 44
1997 100 NA 44
1998 120 30 44
1999 120 30 44

This solution could be combined with dplyr like this:

library(dplyr)
library(zoo)

df %>% read.zoo(index = 2, split = 1, FUN = identity) %>% na.locf(na.rm = FALSE)

input

Here is the input used for the examples above:

df <- structure(list(houseID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
  2L, 3L, 3L, 3L, 3L, 3L), year = c(1995L, 1996L, 1997L, 1998L, 
  1999L, 1995L, 1996L, 1997L, 1998L, 1999L, 1995L, 1996L, 1997L, 
  1998L, 1999L), price = c(NA, 100L, NA, 120L, NA, NA, NA, NA, 
  30L, NA, NA, 44L, NA, NA, NA)), .Names = c("houseID", "year", 
  "price"), class = "data.frame", row.names = c(NA, -15L))

REVISED Re-arranged and added more solutions. Revised dplyr/zoo solution to conform to latest changes dplyr.

这篇关于使用dplyr窗口函数来制作尾值(填入NA值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆