在组内获取(t-1)数据 [英] Get the (t-1) data within groups
问题描述
项目日期价格
A 30/3/2013 2082
B 19/3/2013 1567
B 22/2/2013 1642
C 12/4/2013 1575
C 5/6/2013 1582
我希望按照一组具有最终实体价格的列。例如,对于第2行,同一组的最后一个实例价格将为1642.最终数据将如下所示:
项目日期价格lastPrice
A 30/3/2013 2082 0
B 19/3/2013 1567 1642
B 22/2/2013 1642 0
C 12/4 / 2013 1575 0
C 5/6/2013 1582 1575
如何做?我面临的主要问题是数据可能不会按日期排序,所以不能像最后一个单元格一样。
这是一个选项。如果 0
因为 0 $ c,我还建议使用
NA
$ c>可以是实际价格。
库(dplyr)
df%>%
(as.Date(Date,format =%d /%m /%Y))%>%
group_by(Project)%>%
mutate(lastPrice = lag(price)
#来源:本地数据框[5 x 4]
#组:项目
#
#项目日期价格lastPrice
#1 B 22 / 2/2013 1642 NA
#2 B 19/3/2013 1567 1642
#3 A 30/3/2013 2082 NA
#4 C 12/4/2013 1575 NA
#5 C 5/6/2013 1582 1575
另一个选择是使用中的 shift
data.table的开发版本
库( data.table)## v> = 1.9.5
setDT(df)[order(as.Date(Date,format =%d /%m /%Y)),
lastPrice:= shift(price),
by = Project]
#项目日期价格lastPrice
#1:A 30/3/2013 2082 NA
#2 :B 19/3/2013 1567 1642
#3:B 22/2/2013 1642 NA
#4:C 12/4/2013 1575 NA
#5:C 5/6 / 2013 1582 1575
或与基础R
df< - df [order(df $ Project,as.Date(df $ Date,format =%d /%m /%Y)),]
in(df,lastPrice< - ave(price,Project,FUN = function(x)c(NA,x [-length(x)])))
#项目日期价格最后价格
#1 A 30/3/2013 2082 NA
#3 B 22/2/2013 1642 NA
#2 B 19/3/2013 1567 1642
#4 C 12/4/2013 1575 NA
#5 C 5/6/2013 1582 1575
作为附注,最好将日期列保留在 Date
类中e第一个地方,所以我建议做 df $ Date< - as.Date(df $ Date,format =%d /%m /%Y)
一劳永逸。
Apologies if this has been asked before, but I couldn't find any question which answers this exactly. I have a data like this:
Project Date price
A 30/3/2013 2082
B 19/3/2013 1567
B 22/2/2013 1642
C 12/4/2013 1575
C 5/6/2013 1582
I want to have a column with last-instance prices by group. For example, for row 2, the last instance price for same group will be 1642. The final data will look somewhat like this:
Project Date price lastPrice
A 30/3/2013 2082 0
B 19/3/2013 1567 1642
B 22/2/2013 1642 0
C 12/4/2013 1575 0
C 5/6/2013 1582 1575
How to do this? The main issue I'm facing is that the data may not be ordered by date so its not as if I can just take the last cell.
Here's an option. I'd also recommend to use NA
s instead if 0
because 0
could be actual price.
library(dplyr)
df %>%
arrange(as.Date(Date, format = "%d/%m/%Y")) %>%
group_by(Project) %>%
mutate(lastPrice = lag(price))
# Source: local data frame [5 x 4]
# Groups: Project
#
# Project Date price lastPrice
# 1 B 22/2/2013 1642 NA
# 2 B 19/3/2013 1567 1642
# 3 A 30/3/2013 2082 NA
# 4 C 12/4/2013 1575 NA
# 5 C 5/6/2013 1582 1575
Another option is to use shift
from the devel version of data.table
library(data.table) ## v >= 1.9.5
setDT(df)[order(as.Date(Date, format = "%d/%m/%Y")),
lastPrice := shift(price),
by = Project]
# Project Date price lastPrice
# 1: A 30/3/2013 2082 NA
# 2: B 19/3/2013 1567 1642
# 3: B 22/2/2013 1642 NA
# 4: C 12/4/2013 1575 NA
# 5: C 5/6/2013 1582 1575
Or with base R
df <- df[order(df$Project, as.Date(df$Date, format = "%d/%m/%Y")), ]
within(df, lastPrice <- ave(price, Project, FUN = function(x) c(NA, x[-length(x)])))
# Project Date price lastPrice
# 1 A 30/3/2013 2082 NA
# 3 B 22/2/2013 1642 NA
# 2 B 19/3/2013 1567 1642
# 4 C 12/4/2013 1575 NA
# 5 C 5/6/2013 1582 1575
As a side note, it is better to keep your date column in a Date
class in the first place, so I'd recommend doing df$Date <- as.Date(df$Date, format = "%d/%m/%Y")
once and for all.
这篇关于在组内获取(t-1)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!