Python:条形图-在所有年份中按a)年和b)季度绘制值的总和 [英] Python: Bar chart - plot sum of values by a) year and b) quarter across all years

查看:399
本文介绍了Python:条形图-在所有年份中按a)年和b)季度绘制值的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有时间序列数据,即按日期(YYYY-MM-DD),收益,pnl,交易次数:

I have time series data, i.e. by date (YYYY-MM-DD), returns, pnl, # of trades:

date             returns       pnl      no_trades
1998-01-01         0.01        0.05         5
1998-01-02        -0.04        0.12         2
...
2010-12-31         0.05        0.25         3

现在我想显示水平条形图 a)回报的平均值 b)总数

Now I would like to show horizontal bar charts with a) the average of the returns b) sum of the pnls

作者:

1)年,即1998、1999,...,2010

1) year, i.e. 1998, 1999, ..., 2010

2)整个季度,即Q1(YYYY-01-01至YYYY-03-31),Q2,..,Q4

2) quarter across all years, i.e. Q1 (YYYY-01-01 to YYYY-03-31), Q2, .., Q4

此外,每1)和2)的交易次数的总和应在每个单杠旁边表示一个数字.

Additionally, the sum of # of trades per 1) and 2) should denote a number next to each of the horizontal bars.

所以我认为需要分两个步骤:

So in my opinion there needs to be two separate steps:

1)以正确的格式获取数据

1) Get the data in the right format

2)将数据输入图,然后覆盖多个图.

2) Feed the data to the plot and then with overlay of multiple plots.

样本数据:

start = datetime(1998, 1, 1)
end = datetime(2001, 12, 31)
dates = pd.date_range(start, end, freq = 'D')

df = pd.DataFrame(np.random.randn(len(dates), 3), index = dates, 
                  columns = ['returns', 'pnl', 'no_trades'])

因此可以是两个水平条形图,分别用于年份和季度:

So that could be two horizontal bar charts for year and quarter each:

1)一种回报:条形图,条形图的中间数字,条形图末的no_trades之和

1) one for returns: bar chart, number in the middle of the bar, sum of no_trades at the end of the bar

2)代表pnl:条形图,条形图中间的数字,条形图末尾的no_trades之和

2) one for pnl: bar chart, number in the middle of the bar, sum of no_trades at the end of the bar

在条形图上横穿虚线垂直线,显示平均收益率和pnl.

Plus a dotted line vertical line across the going across the bars showing the average returns and pnl.

我可以在excel中做到这一点(实际上是在各自的视图中添加列,然后对其进行透视图绘制),但我希望采用自动"方式,可以通过python进行复制(或了解其完成方式).

I could do it in excel (which in fact is adding columns with the respective view and then pivot chart it), but would prefer an "automatized" way with the possibility to reproduce (or understand how it's done) via python.

edit:正如下面的评论中所讨论的,这就是我已经走了多远;但是,我不确定这是否是关于1)最快的方法.我目前正在研究2).

edit: as discussed in below comment, this is how far I've got; however, I am not sure whether this is the most the fastest approach with regards to 1). I am currently working on 2).

df_ret_year = df[['date', 'returns']].groupby(df['date'].dt.year).mean()
df_ret_quarter = df[['date', 'returns']].groupby(df['date'].dt.quarter).mean()

df_pnl_year = df[['date', 'pnl']].groupby(df['date'].dt.year).sum()
df_pnl_quarter = df[['date', 'pnl']].groupby(df['date'].dt.quarter).sum()

df_trades_year = df[['date', 'pnl']].groupby(df['date'].dt.year).sum()
df_trades_quarter = df[['date', 'pnl']].groupby(df['date'].dt.quarter).sum()

推荐答案

start = datetime(1998, 1, 1)
end = datetime(2001, 12, 31)
dates = pd.date_range(start, end, freq = 'D')

使用MultiIndex创建数据框架-(年,季度)

Create the DataFrame with a MultiIndex - (year,quarter)

index = pd.MultiIndex.from_tuples([(thing.year, thing.quarter) for thing in dates])
df = pd.DataFrame(np.random.randn(len(dates), 3), index = index, 
                  columns = ['returns', 'pnl', 'no_trades'])

然后您可以按年份,季度或年份和季度分组:

Then you can group by year, quarter or year and quarter:

gb_yr = df.groupby(level=0)
gb_qtr = df.groupby(level=1)
gb_yr_qtr = df.groupby(level=(0,1))

>>> 
>>> # yearly means
>>> gb_yr.mean()
       returns       pnl  no_trades
1998  0.080989 -0.019115   0.142576
1999 -0.040881 -0.005331   0.029815
2000 -0.036227 -0.100028  -0.009175
2001  0.097230 -0.019342  -0.089498
>>> 
>>> # quarterly means across all years
>>> gb_qtr.mean()
    returns       pnl  no_trades
1  0.036992  0.023923   0.048497
2  0.053445 -0.039583   0.076721
3  0.003891 -0.016180   0.004619
4  0.007145 -0.111050  -0.054988
>>> 
>>> # means by year and quarter
>>> gb_yr_qtr.mean()
         returns       pnl  no_trades
1998 1 -0.062570  0.139856   0.105288
     2  0.044946 -0.008685   0.200393
     3  0.152209  0.007341   0.119093
     4  0.185858 -0.211401   0.145347
1999 1  0.085799  0.072655   0.054060
     2  0.111595  0.002972   0.068792
     3 -0.194506 -0.093435   0.107210
     4 -0.161999 -0.001732  -0.109851
2000 1  0.001543 -0.083488   0.174226
     2 -0.064343 -0.158431  -0.071415
     3 -0.036334 -0.037008  -0.068717
     4 -0.045669 -0.121640  -0.069474
2001 1  0.123592 -0.032138  -0.140982
     2  0.121582  0.005810   0.109115
     3  0.094194  0.058382  -0.139110
     4  0.050388 -0.109429  -0.185975
>>>
>>> # operate on single columns
>>> gb_yr['pnl'].sum()
1998    -6.976917
1999    -1.945935
2000   -36.610206
2001    -7.060010
Name: pnl, dtype: float64

>>> # plotting
>>> from matplotlib import pyplot as plt
>>> gb_yr.mean().plot()
<matplotlib.axes._subplots.AxesSubplot object at 0x000000000C04BF28>
>>> plt.show()
>>> plt.close()

这篇关于Python:条形图-在所有年份中按a)年和b)季度绘制值的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆