如何确定R中值的时间序列趋势 [英] How to determine trend of time-series of values in R

查看:246
本文介绍了如何确定R中值的时间序列趋势的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻求帮助来编写一个函数,该函数可以识别数据集中给定客户的值中的趋势(正/负/混合",请参见下面的定义).

我有以下交易数据;所有客户各自进行3到13笔交易.

customer_ID transaction_num sales
Josh         1              $35
Josh         2              $50
Josh         3              $65
Ray          1              $65
Ray          2              $52
Ray          3              $49
Ray          4              $15
Eric         1              $10 
Eric         2              $13
Eric         3              $9

我想在R中编写一个函数,该函数如下填充新数据框

Customer_ID     Sales_Slope  
Josh              Positive
Ray               Negative
Eric               Mixed

其中:

乔什的斜率,因为他的所有交易销售成本随着每增加一个购物点而不断增加

Ray的坡度为,因为随着每增加一个购物点,他的所有交易销售成本将继续降低

埃里克(Eric)的斜率混合,因为他的所有交易销售成本都在波动……没有明确的趋势……

我已经做了大量尝试来自己做,但是被卡住了..这是一些我已经可以组合在一起的伪代码

counter = max(transaction_num)
while counter >= 0 
 if sales at max transaction_num are greater than sales at max transaction_num - 1) 
   then counter = counter - 1 ; else "not positive slope trend"

解决方案

我想我将从这样的事情开始. data.table通常对于较大的数据集非常有效.

#Make fake data
require("data.table")
data <- data.table(customer_ID=c(rep("Josh",3),rep("Ray",4),rep("Eric",3)),
                   sales=c(35,50,65,65,52,49,15,10,13,9))
data[,transaction_num:=seq(1,.N),by=c("customer_ID")]

现在输入实际代码.

data <- data.table(data)

#Calculate difference in rolling two time periods
rolled.up <- data[,list(N.Minus.1=.N-1,Change=list(
  sales[transaction_num+1]-sales[transaction_num])),
  by=c("customer_ID")]

#Sum up positive and negative values
rolled.up[,Pos.Values:=as.numeric(lapply(Change,FUN=function(x) {sum(1*(x>0),na.rm=T)}))]
rolled.up[,Neg.Values:=(N.Minus.1-Pos.Values)]

#Make Sales Slope variable
rolled.up[,Sales_Slope:=ifelse(Pos.Values>0 & Neg.Values==0,"Positive",
      ifelse(Pos.Values==0 & Neg.Values>0,"Negative","Mixed"))]

#Make final table
final.table <- rolled.up[,list(customer_ID,Sales_Slope)]
final.table

#      customer_ID Sales_Slope
# 1:        Josh    Positive
# 2:         Ray    Negative
# 3:        Eric       Mixed

#You can always merge this result back onto your main dataset if you want
data <- merge(x=data,y=final.table,by=c("customer_ID"),all.x=T)

I am looking for help writing a function that can identify a trend ("positive/negative/mixed", see definition below) in a value for a given customer in a dataset.

I have the following transactional data; all customers have between 3-13 transactions each.

customer_ID transaction_num sales
Josh         1              $35
Josh         2              $50
Josh         3              $65
Ray          1              $65
Ray          2              $52
Ray          3              $49
Ray          4              $15
Eric         1              $10 
Eric         2              $13
Eric         3              $9

I would like to write a function in R that populates a new dataframe as follows

Customer_ID     Sales_Slope  
Josh              Positive
Ray               Negative
Eric               Mixed

where:

Josh's slope is positive because all of his transaction sales costs continue to increase with each additional shopping point

Ray's slope is negative because all of his transactions sales costs continue to decrease with each additional shopping point

Eric's slope is mixed because all of his transaction sales costs fluctate... with no clear trend...

I have tried quite extensively to do this myself but am stuck.. here is some pseudo-code I have been able to put together

counter = max(transaction_num)
while counter >= 0 
 if sales at max transaction_num are greater than sales at max transaction_num - 1) 
   then counter = counter - 1 ; else "not positive slope trend"

解决方案

I think I would start with something like this. data.table is usually pretty efficient with bigger datasets.

#Make fake data
require("data.table")
data <- data.table(customer_ID=c(rep("Josh",3),rep("Ray",4),rep("Eric",3)),
                   sales=c(35,50,65,65,52,49,15,10,13,9))
data[,transaction_num:=seq(1,.N),by=c("customer_ID")]

Now for the actual code.

data <- data.table(data)

#Calculate difference in rolling two time periods
rolled.up <- data[,list(N.Minus.1=.N-1,Change=list(
  sales[transaction_num+1]-sales[transaction_num])),
  by=c("customer_ID")]

#Sum up positive and negative values
rolled.up[,Pos.Values:=as.numeric(lapply(Change,FUN=function(x) {sum(1*(x>0),na.rm=T)}))]
rolled.up[,Neg.Values:=(N.Minus.1-Pos.Values)]

#Make Sales Slope variable
rolled.up[,Sales_Slope:=ifelse(Pos.Values>0 & Neg.Values==0,"Positive",
      ifelse(Pos.Values==0 & Neg.Values>0,"Negative","Mixed"))]

#Make final table
final.table <- rolled.up[,list(customer_ID,Sales_Slope)]
final.table

#      customer_ID Sales_Slope
# 1:        Josh    Positive
# 2:         Ray    Negative
# 3:        Eric       Mixed

#You can always merge this result back onto your main dataset if you want
data <- merge(x=data,y=final.table,by=c("customer_ID"),all.x=T)

这篇关于如何确定R中值的时间序列趋势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆