在 Julia 中将 DataFrame 重新采样为每小时 15 分钟和 5 分钟的周期 [英] Resampling a DataFrame to hourly 15min and 5min periods in Julia

查看:16
本文介绍了在 Julia 中将 DataFrame 重新采样为每小时 15 分钟和 5 分钟的周期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Julia 很陌生,但我正在尝试一下,因为基准测试声称它比 Python 快得多.

I'm quite new to Julia but I'm giving it a try since the benchmarks claim it to be much faster than Python.

我正在尝试使用 ["unixtime", "price", "amount"] 格式的一些股票报价数据

I'm trying to use some stock tick data in the format ["unixtime", "price", "amount"]

我设法加载数据并将 unixtime 转换为 Julia 中的日期,但现在我需要重新采样数据以使用 olhc(开盘价、最高价、最低价、收盘价)作为价格和金额的总和,用于Julia 的特定时间段(每小时、15 分钟、5 分钟等):

I managed to load the data and convert the unixtime to a date in Julia, but now I need to resample the data to use olhc (open, high, low, close) for the price and sum for the amount, for a specific period in Julia (hourly, 15min, 5 min, etc...):

julia> head(btc_raw_data)
6x3 DataFrame:
                           date price  amount
[1,]    2011-09-13T13:53:36 UTC   5.8     1.0
[2,]    2011-09-13T13:53:44 UTC  5.83     3.0
[3,]    2011-09-13T13:53:49 UTC   5.9     1.0
[4,]    2011-09-13T13:53:54 UTC   6.0    20.0
[5,]    2011-09-13T14:32:53 UTC  5.95 12.4521
[6,]    2011-09-13T14:35:04 UTC  5.88   7.458

我看到有一个名为 Resampling 的包,但它似乎只接受我希望输出数据具有的行数的时间段.

I see there is a package called Resampling, but it doesn't seem to accept a time period only the number of row I want the output data to have.

还有其他选择吗?

推荐答案

您可以使用 https://github.com/femtotrader/TimeSeriesIO.jl

using TimeSeriesIO: TimeArray
ta = TimeArray(df, colnames=[:price], timestamp=:date)

您可以使用 TimeSeriesResampler https://github.com/重新采样时间序列(来自 TimeSeries.jl 的 TimeArray)femtotrader/TimeSeriesResampler.jl和时间帧 https://github.com/femtotrader/TimeFrames.jl

You can resample timeseries (TimeArray from TimeSeries.jl) using TimeSeriesResampler https://github.com/femtotrader/TimeSeriesResampler.jl and TimeFrames https://github.com/femtotrader/TimeFrames.jl

using TimeSeriesResampler: resample, mean, ohlc, sum, TimeFrame

# Define a sample timeseries (prices for example)
idx = DateTime(2010,1,1):Dates.Minute(1):DateTime(2011,1,1)
idx = idx[1:end-1]
N = length(idx)
y = rand(-1.0:0.01:1.0, N)
y = 1000 + cumsum(y)
#df = DataFrame(Date=idx, y=y)
ta = TimeArray(collect(idx), y, ["y"])
println("ta=")
println(ta)

# Define how datetime should be grouped (timeframe)
tf = TimeFrame(dt -> floor(dt, Dates.Minute(15)))

# resample using OHLC values
ta_ohlc = ohlc(resample(ta, tf))
println("ta_ohlc=")
println(ta_ohlc)

# resample using mean values
ta_mean = mean(resample(ta, tf))
println("ta_mean=")
println(ta_mean)

# Define an other sample timeseries (volume for example)
vol = rand(0:0.01:1.0, N)
ta_vol = TimeArray(collect(idx), vol, ["vol"])
println("ta_vol=")
println(ta_vol)

# resample using sum values
ta_vol_sum = sum(resample(ta_vol, tf))
println("ta_vol_sum=")
println(ta_vol_sum)

你应该得到:

julia> ta
525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00

                      y
2010-01-01T00:00:00 | 1000.16
2010-01-01T00:01:00 | 1000.1
2010-01-01T00:02:00 | 1000.98
2010-01-01T00:03:00 | 1001.38
⋮
2010-12-31T23:56:00 | 972.3
2010-12-31T23:57:00 | 972.85
2010-12-31T23:58:00 | 973.74
2010-12-31T23:59:00 | 972.8


julia> ta_ohlc
35040x4 TimeSeries.TimeArray{Float64,2,DateTime,Array{Float64,2}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00

                      Open       High       Low        Close
2010-01-01T00:00:00 | 1000.16    1002.5     1000.1     1001.54
2010-01-01T00:15:00 | 1001.57    1002.64    999.38     999.38
2010-01-01T00:30:00 | 999.13     1000.91    998.91     1000.91
2010-01-01T00:45:00 | 1001.0     1006.42    1001.0     1006.42
⋮
2010-12-31T23:00:00 | 980.84     981.56     976.53     976.53
2010-12-31T23:15:00 | 975.74     977.46     974.71     975.31
2010-12-31T23:30:00 | 974.72     974.9      971.73     972.07
2010-12-31T23:45:00 | 972.33     973.74     971.49     972.8


julia> ta_mean
35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00

                      y
2010-01-01T00:00:00 | 1001.1047
2010-01-01T00:15:00 | 1001.686
2010-01-01T00:30:00 | 999.628
2010-01-01T00:45:00 | 1003.5267
⋮
2010-12-31T23:00:00 | 979.1773
2010-12-31T23:15:00 | 975.746
2010-12-31T23:30:00 | 973.482
2010-12-31T23:45:00 | 972.3427

julia> ta_vol
525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00

                      vol
2010-01-01T00:00:00 | 0.37
2010-01-01T00:01:00 | 0.67
2010-01-01T00:02:00 | 0.29
2010-01-01T00:03:00 | 0.28
⋮
2010-12-31T23:56:00 | 0.74
2010-12-31T23:57:00 | 0.66
2010-12-31T23:58:00 | 0.22
2010-12-31T23:59:00 | 0.47


julia> ta_vol_sum
35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00

                      vol
2010-01-01T00:00:00 | 7.13
2010-01-01T00:15:00 | 6.99
2010-01-01T00:30:00 | 8.73
2010-01-01T00:45:00 | 8.27
⋮
2010-12-31T23:00:00 | 6.11
2010-12-31T23:15:00 | 7.49
2010-12-31T23:30:00 | 5.75
2010-12-31T23:45:00 | 8.36

这篇关于在 Julia 中将 DataFrame 重新采样为每小时 15 分钟和 5 分钟的周期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆