在 R 中将数据帧转换为 TS [英] Transforming a dataframe into a TS in R
问题描述
我一直在尝试将我放在一起的数据框转换为时间序列,但由于某种原因它不起作用.我对 R 很陌生.
I've been trying to transform a dataframe I put together into a Time Series, but for some reason it doesn't work. I am very new to R.
x<-Sales_AEMBG%>%
+ select(Ecriture.DatEcr, Crédit, Mapping)
> names(x)<-c("Dates","Revenue","Mapping")
> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 15167 obs. of 3 variables:
$ Dates : POSIXct, format: "2016-01-02" "2016-01-02" "2016-01-02" "2016-01-02" ...
$ Revenue: num 124065 214631 135810 225293 57804 ...
$ Mapping: chr "E.M 1.5 L" "E.M 1.5 L" "E.M 1.5 L" "E.M 1.5 L" ...'
当我尝试查看数据时,这就是我所拥有的
When I try to look at the data, here's what I have
> head(x)
# A tibble: 6 x 3
Dates Revenue Mapping
<dttm> <dbl> <chr>
1 2016-01-02 00:00:00 124065. E.M 1.5 L
2 2016-01-02 00:00:00 214631. E.M 1.5 L
3 2016-01-02 00:00:00 135810. E.M 1.5 L
4 2016-01-02 00:00:00 225293. E.M 1.5 L
5 2016-01-02 00:00:00 57804. E.M 1.5 L
6 2016-01-02 00:00:00 124065. E.M 1.5 L
当然,我试过as.ts函数
Of course, I tried the as.ts function
> x_xts <- as.ts(x)
Warning message:
In data.matrix(data) : NAs introduced by coercion
> is.ts(x)
[1] FALSE
但它一直告诉我我的数据框仍未被识别为 TS.
But it keeps telling me that my dataframe is still not recognized as a TS.
你有什么建议?
谢谢
推荐答案
我在您的数据中添加了更多观察结果.
I've added a few more observations to your data.
# A tibble: 12 x 3
Dates Revenue Mapping
<dttm> <dbl> <chr>
1 2016-01-02 00:00:00 124065 E.M 1.5 L
2 2016-01-02 00:00:00 214631 E.M 1.5 L
3 2016-01-03 00:00:00 135810 E.M 1.5 L
4 2016-01-03 00:00:00 225293 E.M 1.5 L
5 2016-01-05 00:00:00 57804 E.M 1.5 L
6 2016-01-05 00:00:00 124065 E.M 1.5 L
7 2016-01-02 00:00:00 24065 E.M 1.5 M
8 2016-01-02 00:00:00 14631 E.M 1.5 M
9 2016-01-03 00:00:00 35810 E.M 1.5 M
10 2016-01-03 00:00:00 25293 E.M 1.5 M
11 2016-01-05 00:00:00 7804 E.M 1.5 M
12 2016-01-05 00:00:00 24065 E.M 1.5 M
<小时>
首先,您需要将销售额按天 (Dates
) 和产品类型(您的 Mapping
变量?)相加,然后转换为更广泛的数据格式:
First you need to sum the sales by day (Dates
) and product type (your Mapping
variable?), and pivot into a wider data format:
library(dplyr)
library(tidyr)
x.sum <- x %>%
group_by(Mapping, Dates) %>%
summarise(Revenue=sum(Revenue)) %>%
pivot_wider(id_cols=Dates, names_from=Mapping, values_from=Revenue)
<小时>
# A tibble: 3 x 3
Dates `E,M 1.5 L` `E,M 1.5 M`
<dttm> <dbl> <dbl>
1 2016-01-02 00:00:00 338696 38696
2 2016-01-03 00:00:00 361103 61103
3 2016-01-05 00:00:00 181869 31869
<小时>
请注意,我故意省略了 1 月 4 日.
Note that I've deliberately omitted Jan 4.
如果您的时间序列数据缺少日期,例如周末金融市场休市的股票价格,则使用 as.ts
(或 ts
)函数不会工作.如果没有缺失的日期,那么将数据转换为时间序列对象(ts")的正确方法是指定要转换的列(x.sum[,2:3]代码>)以及该系列的开始时间(2016 年 1 月 2 日)和频率(每日).
If your time series data has missing days, such as stock prices where financial markets are closed on the weekends, then using the as.ts
(or ts
) function won't work. If there are no missing days, then then correct way to convert the data into a time series object ("ts") is to specify the column(s) to convert (x.sum[,2:3]
) and the start (January 2, 2016) and frequency (daily) of the series.
x.ts <- ts(x.sum[,2:3], start=c(2016, 2), frequency=365)
注意开始,因为第二个参数取决于指定的频率.这里,365 表示每天,所以2"表示 2016 年的第 2 天.如果频率是每月,2"表示 2016 年的第 2 个月.
Be careful with the start as the second argument depends on the specified frequency. Here, 365 means daily, so the "2" means day 2 of year 2016. If the frequency was monthly, the "2" would mean month 2 of year 2016.
但正如我所提到的,ts
不会忽略任何缺失的日子.所以对于这个组成数据,如果你绘制了时间序列,那么你会得到错误的信息.
But as I mentioned, ts
doesn't ignore any missing days. So for this make-up data, if you plotted the time series, then you will get the wrong information.
在这种情况下,可以使用 xts 和 zoo 等其他软件包来简化工作.
In this case, other packages such as xts and zoo can be used to simply the work.
library(xts)
x.xts <- xts(x.sum[,2:3], order.by=x.sum$Dates)
plot(x.xts) # Correct results.
Other answers about time series can be found here and here.
这篇关于在 R 中将数据帧转换为 TS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!