在组内获取(t-1)数据 [英] Get the (t-1) data within groups

查看:224
本文介绍了在组内获取(t-1)数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果以前有人问过这个问题,我很抱歉,但是我找不到任何问题能够正确答案。我有一个这样的数据:

 项目日期价格
A 30/3/2013 2082
B 19/3/2013 1567
B 22/2/2013 1642
C 12/4/2013 1575
C 5/6/2013 1582

我希望按照一组具有最终实体价格的列。例如,对于第2行,同一组的最后一个实例价格将为1642.最终数据将如下所示:

 项目日期价格lastPrice 
A 30/3/2013 2082 0
B 19/3/2013 1567 1642
B 22/2/2013 1642 0
C 12/4 / 2013 1575 0
C 5/6/2013 1582 1575

如何做?我面临的主要问题是数据可能不会按日期排序,所以不能像最后一个单元格一样。

解决方案

这是一个选项。如果 0 因为 0 NA $ c>可以是实际价格。

 库(dplyr)
df%>%
(as.Date(Date,format =%d /%m /%Y))%>%
group_by(Project)%>%
mutate(lastPrice = lag(price)

#来源:本地数据框[5 x 4]
#组:项目

#项目日期价格lastPrice
#1 B 22 / 2/2013 1642 NA
#2 B 19/3/2013 1567 1642
#3 A 30/3/2013 2082 NA
#4 C 12/4/2013 1575 NA
#5 C 5/6/2013 1582 1575






另一个选择是使用中的 shift data.table的开发版本

 库( data.table)## v> = 1.9.5 
setDT(df)[order(as.Date(Date,format =%d /%m /%Y)),
lastPrice:= shift(price),
by = Project]

#项目日期价格lastPrice
#1:A 30/3/2013 2082 NA
#2 :B 19/3/2013 1567 1642
#3:B 22/2/2013 1642 NA
#4:C 12/4/2013 1575 NA
#5:C 5/6 / 2013 1582 1575






或与基础R



  df<  -  df [order(df $ Project,as.Date(df $ Date,format =%d /%m /%Y)),] 
in(df,lastPrice< - ave(price,Project,FUN = function(x)c(NA,x [-length(x)])))
#项目日期价格最后价格
#1 A 30/3/2013 2082 NA
#3 B 22/2/2013 1642 NA
#2 B 19/3/2013 1567 1642
#4 C 12/4/2013 1575 NA
#5 C 5/6/2013 1582 1575






作为附注,最好将日期列保留在 Date 类中e第一个地方,所以我建议做 df $ Date< - as.Date(df $ Date,format =%d /%m /%Y)一劳永逸。


Apologies if this has been asked before, but I couldn't find any question which answers this exactly. I have a data like this:

Project        Date   price
      A   30/3/2013    2082
      B   19/3/2013    1567
      B   22/2/2013    1642
      C   12/4/2013    1575
      C    5/6/2013    1582

I want to have a column with last-instance prices by group. For example, for row 2, the last instance price for same group will be 1642. The final data will look somewhat like this:

Project        Date   price   lastPrice
      A   30/3/2013    2082           0
      B   19/3/2013    1567        1642
      B   22/2/2013    1642           0 
      C   12/4/2013    1575           0
      C    5/6/2013    1582        1575

How to do this? The main issue I'm facing is that the data may not be ordered by date so its not as if I can just take the last cell.

解决方案

Here's an option. I'd also recommend to use NAs instead if 0 because 0 could be actual price.

library(dplyr)
df %>% 
  arrange(as.Date(Date, format = "%d/%m/%Y")) %>%
  group_by(Project) %>%
  mutate(lastPrice = lag(price))

# Source: local data frame [5 x 4]
# Groups: Project
# 
#   Project      Date price lastPrice
# 1       B 22/2/2013  1642        NA
# 2       B 19/3/2013  1567      1642
# 3       A 30/3/2013  2082        NA
# 4       C 12/4/2013  1575        NA
# 5       C  5/6/2013  1582      1575


Another option is to use shift from the devel version of data.table

library(data.table) ## v >= 1.9.5
setDT(df)[order(as.Date(Date, format = "%d/%m/%Y")), 
                lastPrice := shift(price), 
                by = Project]

#    Project      Date price lastPrice
# 1:       A 30/3/2013  2082        NA
# 2:       B 19/3/2013  1567      1642
# 3:       B 22/2/2013  1642        NA
# 4:       C 12/4/2013  1575        NA
# 5:       C  5/6/2013  1582      1575


Or with base R

df <- df[order(df$Project, as.Date(df$Date, format = "%d/%m/%Y")), ]
within(df, lastPrice <- ave(price, Project, FUN = function(x) c(NA, x[-length(x)])))
#   Project      Date price lastPrice
# 1       A 30/3/2013  2082        NA
# 3       B 22/2/2013  1642        NA
# 2       B 19/3/2013  1567      1642
# 4       C 12/4/2013  1575        NA
# 5       C  5/6/2013  1582      1575


As a side note, it is better to keep your date column in a Date class in the first place, so I'd recommend doing df$Date <- as.Date(df$Date, format = "%d/%m/%Y") once and for all.

这篇关于在组内获取(t-1)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆