数据表中的时间序列为“ts”列? [英] Time series as `ts` column in data.table?

查看:132
本文介绍了数据表中的时间序列为“ts”列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多组时间序列数据,并希望帮助找出最好的方法,使他们进入R并分析他们与R.我非常熟悉data.table但不是那么熟悉R的 ts 类支持时间序列分析。

I have multiple sets of time series data and would like help figuring out the best way to get them into R and analyze them with R. I'm pretty familiar with data.table but not so familiar with R's ts class supporting time series analysis.

特别是,我想知道如何使用 ts 在这种情况下或如果 ts 中有限制(例如聚合一组 ts

In particular, I want to known how to use ts in this situation or if there are limitations in ts (such as problems aggregating a set of ts objects) that make it in appropriate to use here.

有大量的商店。对于每个商店,我每天有多个数据点,例如销售量(以美元计),销售量(以交易次数计)和商店流量(进入商店的人数)。 (实际上,我有一个表格,列的商店ID,日期,以及该商店和日期的数据。)

There are a large number of stores. For each store, I have multiple data points for each day, such as sales volume in dollars, sales volume in number of transactions, and store traffic (number of people entering the store). (Actually what I have is a table with columns store ID, date, and the data for that store and date.)

我一直在做的是使用数据.table每个存储一行,将数据按存储集中到几个月,并将每个月的值存储在单独的命名列(例如jan14_dollars,feb14_dollars ...)中,但这是很笨重的,原因很多,特别是当我想看看周或季。

What I've been doing is using a data.table with one row per store, aggregating the data by store into months and storing the values for each month in a separate named column (e.g. jan14_dollars, feb14_dollars...) but this is unwieldy for a lot of reasons, in particular when I want to look at weeks or quarters.

我认为正确的处理方式是使用类型 ts 的列, store,dollars_ts,transactions_ts,traffic_ts 但是(a)如何获取数据到该格式和(b)可以 ts 结合整数可以给我的结果我想要的方式? 如果您只能回答(a)或(b),但不能同时回答两者,请尽量回答。

I was thinking the right way to handle this was to have columns of type ts so each row would be just be store, dollars_ts, transactions_ts, traffic_ts but (a) how do I get the data into that format and (b) can ts be combined the way integers can to give me the results I want? If you can only answer (a) or (b) but not both, please do answer what you can.

数据集,但您可以生成一个随机的,如下:

I cannot provide a realistic data set, but you can generate a random one to play with like this:

require("data.table")

storeData <- CJ(store = toupper(letters), date = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by="day"))
storeData$dollars = sample(100:100000, nrow(storeData), replace = TRUE)/100
storeData$transactions <- sample(0:1000, nrow(storeData), replace = TRUE)
storeData$traffic  <- storeData$transactions + sample(0:1000, nrow(storeData), replace = TRUE)

head(storeData)
   store       date  dollars transactions traffic
1:     A 2012-01-01   48.60          409     990
2:     A 2012-01-02  996.89           36     428
3:     A 2012-01-03   69.35          647    1103
4:     A 2012-01-04  334.56          953     973
5:     A 2012-01-05  692.99          958    1753
6:     A 2012-01-06  973.32          724    1086



分析



我想回答许多商店的美元销售增长正面?和在美元/交易的变化和交通的变化之间有关系吗?并将数据分成时间段,并比较跨时间段的答案(例如今年Q1与去年Q1)。

The Analysis

I want to answer questions like "how many stores had positive dollar sales growth?" and "is there a relationship between change in dollars/transaction and change in traffic?" and to bin the data into time periods and compare the answers across time periods (e.g. Q1 this year versus Q1 last year).

这些问题可以使用 ts ?如果是这样,我如何获得这些数据到一个合适的列集合或有一些结构除了 data.table 我应该使用?

Can these kinds of questions be answered using ts? If so, how do I get this data into an appropriate set of columns or is there some structure other than data.table I should be using?

请同时说明如何组织数据,以及如何使用数据来回答示例问题2014年1月与1月相比,多少家商店的正面美元销售增长2013?

Please show both how to organize the data and then how to use the data to answer the example questions "how many stores had positive dollar sales growth in January 2014 compared to January 2013?" and "what is the overall trend in dollars per transaction for the past 3 months?"

推荐答案

您需要很多问题。我建议你花时间阅读关于所有的事情data.table可以做涉及连接和聚合数据。下面是一个例子,说明如何获得第一季度每个商店的年增长率。

You're asking a lot of questions here. I recommend you spend time reading about all the things data.table can do involving joins and aggregating data. Here is an example of how you would get the year over year growth of each store in the first quarter.

#get the first day of the first month for your binning
minDate<-min(storeData$date); month(minDate)<-1; day(minDate)<-1

#get the first day of the last month for your binning
maxDate<-max(storeData$date); month(maxDate)<-12; day(maxDate)<-1

#Build some bins
yearly<-data.table(leftBound=seq.Date(minDate,maxDate,by="year"))
quarterly<-data.table(leftBound=seq.Date(minDate,maxDate,by="3 months"))
monthly<-data.table(leftBound=seq.Date(minDate,maxDate,by="month"))

#Example for quarterly data
quarterly[, rollDate:=leftBound]
storeData[, rollDate:=date]

setkey(quarterly,"rollDate")
setkey(storeData,"rollDate")

temp<-quarterly[storeData, roll=TRUE] #associate each (store, date) pair with a quarter

#create a "join table" containing each quarter for each store
jt<-CJ(leftBound=quarterly$leftBound, store=unique(storeData$store))
setkey(temp,"leftBound","store")

dt<-temp[jt, allow.cartesian=TRUE]
dt[, `:=`(year=year(leftBound), quarter=quarter(leftBound))]

qSummary<-dt[,list(dollars=sum(dollars, na.rm=TRUE), 
         transactions=sum(transactions, na.rm=TRUE), 
         traffic=sum(traffic, na.rm=TRUE)),
   by=list(year,quarter,store)] #Summarize the data by quarter

#Get year/year growth for Q1
qSummary[,list(dollarGrowth = dollars[which(year==2014 & quarter==1)] / dollars[which(year==2013 & quarter==1)]), by=store]

 #First five rows...
    store dollarGrowth
 1:     A    0.0134860
 2:     B    0.0137215
 3:     C    0.0188249
 4:     D    0.0163887
 5:     E    0.0037576

这篇关于数据表中的时间序列为“ts”列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆