Seaborn时间序列图与多个序列 [英] Seaborn timeseries plot with multiple series

查看:183
本文介绍了Seaborn时间序列图与多个序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从具有多个序列的数据框中绘制seaborn的时间序列图.

I'm trying to make a time series plot with seaborn from a dataframe that has multiple series.

来自此帖子: 来自熊猫数据帧的新生儿时间序列

我认为tsplot无法正常工作,因为它意在绘制不确定性.

I gather that tsplot isn't going to work as it is meant to plot uncertainty.

那么还有另一种Seaborn方法适用于具有多个系列的折线图吗?

So is there another Seaborn method that is meant for line charts with multiple series?

我的数据框如下:

print(df.info())
print(df.describe())
print(df.values)
print(df.index)

输出:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2013-01-03 to 2014-01-03
Data columns (total 5 columns):
Equity(24 [AAPL])      253 non-null float64
Equity(3766 [IBM])     253 non-null float64
Equity(5061 [MSFT])    253 non-null float64
Equity(6683 [SBUX])    253 non-null float64
Equity(8554 [SPY])     253 non-null float64
dtypes: float64(5)
memory usage: 11.9 KB
None
       Equity(24 [AAPL])  Equity(3766 [IBM])  Equity(5061 [MSFT])  \
count         253.000000          253.000000           253.000000   
mean           67.560593          194.075383            32.547436   
std             6.435356           11.175226             3.457613   
min            55.811000          172.820000            26.480000   
25%            62.538000          184.690000            28.680000   
50%            65.877000          193.880000            33.030000   
75%            72.299000          203.490000            34.990000   
max            81.463000          215.780000            38.970000   

       Equity(6683 [SBUX])  Equity(8554 [SPY])  
count           253.000000          253.000000  
mean             33.773277          164.690180  
std               4.597291           10.038221  
min              26.610000          145.540000  
25%              29.085000          156.130000  
50%              33.650000          165.310000  
75%              38.280000          170.310000  
max              40.995000          184.560000  
[[  77.484  195.24    27.28    27.685  145.77 ]
 [  75.289  193.989   26.76    27.85   146.38 ]
 [  74.854  193.2     26.71    27.875  145.965]
 ..., 
 [  80.167  187.51    37.43    39.195  184.56 ]
 [  79.034  185.52    37.145   38.595  182.95 ]
 [  77.284  186.66    36.92    38.475  182.8  ]]
DatetimeIndex(['2013-01-03', '2013-01-04', '2013-01-07', '2013-01-08',
               '2013-01-09', '2013-01-10', '2013-01-11', '2013-01-14',
               '2013-01-15', '2013-01-16', 
               ...
               '2013-12-19', '2013-12-20', '2013-12-23', '2013-12-24',
               '2013-12-26', '2013-12-27', '2013-12-30', '2013-12-31',
               '2014-01-02', '2014-01-03'],
              dtype='datetime64[ns]', length=253, freq=None, tz='UTC')

这可行(但是我想让Seaborn弄脏我的手):

This works (but I want to get my hands dirty with Seaborn):

df.plot()

输出:

谢谢您的时间!

Update1:​​

Update1:

df.to_dict()返回: https://gist.github.com/anonymous/2bdc1ce0f9d0b6ccd6675ab4f7313a5f

Update2:

使用@knagaev示例代码,我将其缩小为这种差异:

Using @knagaev sample code, I've narrowed it down to this difference:

当前数据帧(print(current_df)的输出):

current dataframe (output of print(current_df)):

                           Equity(24 [AAPL])  Equity(3766 [IBM])  \
2013-01-03 00:00:00+00:00             77.484            195.2400   
2013-01-04 00:00:00+00:00             75.289            193.9890   
2013-01-07 00:00:00+00:00             74.854            193.2000   
2013-01-08 00:00:00+00:00             75.029            192.8200   
2013-01-09 00:00:00+00:00             73.873            192.3800   

所需的数据帧(print(desired_df)的输出):

desired dataframe (output of print(desired_df)):

           Date Company       Kind            Price
0    2014-01-02     IBM       Open       187.210007
1    2014-01-02     IBM       High       187.399994
2    2014-01-02     IBM        Low       185.199997
3    2014-01-02     IBM      Close       185.529999
4    2014-01-02     IBM     Volume   4546500.000000
5    2014-01-02     IBM  Adj Close       171.971090
6    2014-01-02    MSFT       Open        37.349998
7    2014-01-02    MSFT       High        37.400002
8    2014-01-02    MSFT        Low        37.099998
9    2014-01-02    MSFT      Close        37.160000
10   2014-01-02    MSFT     Volume  30632200.000000
11   2014-01-02    MSFT  Adj Close        34.960000
12   2014-01-02    ORCL       Open        37.779999
13   2014-01-02    ORCL       High        38.029999
14   2014-01-02    ORCL        Low        37.549999
15   2014-01-02    ORCL      Close        37.840000
16   2014-01-02    ORCL     Volume  18162100.000000

current_df重组为desired_df的最佳方法是什么?

What's the best way to reorganize the current_df to desired_df?

更新3: 我终于在@knagaev的帮助下使其工作了:

Update 3: I finally got it working from the help of @knagaev:

我必须添加一个虚拟列以及优化索引:

I had to add a dummy column as well as finesse the index:

df['Datetime'] = df.index
melted_df = pd.melt(df, id_vars='Datetime', var_name='Security', value_name='Price')
melted_df['Dummy'] = 0

sns.tsplot(melted_df, time='Datetime', unit='Dummy', condition='Security', value='Price', ax=ax)

产生:

推荐答案

您可以尝试使用您将绘制带有标准错误(统计添加项")的折线图

You will draw your line charts with standard errors ("statistical additions")

我试图模拟您的数据集.这是结果

I tried to simulate your dataset. So here is the results

import pandas.io.data as web
from datetime import datetime
import seaborn as sns

stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime(2014,1,1)
end = datetime(2014,3,28)    
f = web.DataReader(stocks, 'yahoo',start,end)

df = pd.DataFrame(f.to_frame().stack()).reset_index()
df.columns = ['Date', 'Company', 'Kind', 'Price']

sns.tsplot(df, time='Date', unit='Kind', condition='Company', value='Price')

通过这种方式,该示例非常模仿.参数单位"是数据数据帧中标识采样单位(例如,受试者,神经元等)的字段.在每次观察时间/条件时,错误表示将在单位上折叠."(来自文档).因此,我将种类"字段用于说明目的.

By the way this sample is very imitative. The parameter "unit" is "Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. " (from documentation). So I used the 'Kind' field for illustrative purposes.

好的,我为您的数据框做了一个例子. 它具有用于噪声清除"的伪字段:)

Ok, I made an example for your dataframe. It has dummy field for "noise cleaning" :)

import pandas.io.data as web
from datetime import datetime
import seaborn as sns

stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
start = datetime(2010,1,1)
end = datetime(2015,12,31)    
f = web.DataReader(stocks, 'yahoo',start,end)

df = pd.DataFrame(f.to_frame().stack()).reset_index()
df.columns = ['Date', 'Company', 'Kind', 'Price']

df_open = df[df['Kind'] == 'Open'].copy()
df_open['Dummy'] = 0

sns.tsplot(df_open, time='Date', unit='Dummy', condition='Company', value='Price')

P.S.感谢@VanPeer-现在,您可以使用 seaborn.lineplot 对于这个问题

P.S. Thanks to @VanPeer - now you can use seaborn.lineplot for this problem

这篇关于Seaborn时间序列图与多个序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆