pandas 在一个图中比较每小时的多年数据 [英] Pandas Comparing hourly multiple year data in one plot

查看:78
本文介绍了 pandas 在一个图中比较每小时的多年数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我在此表单上有一个名为year的熊猫数据框:

So I have a pandas dataframe, called year, on this form:

                           discharge (m^3/s)  
date                                                                   
2016-01-01 00:00:00           17.6930
2016-01-01 01:00:00           17.3247
2016-01-01 02:00:00           17.2436
2016-01-01 03:00:00           17.5696
2016-01-01 04:00:00           16.4074
2016-01-01 05:00:00           17.5696
2016-01-01 06:00:00           17.0420            
....
2017-12-31 20:00:00           10.5911           
2017-12-31 21:00:00           10.5620          
2017-12-31 22:00:00           10.7374          
2017-12-31 23:00:00           10.5620 

数据集包含几年的排放数据,我想做一个比较f.ex的图. 2016年和2017年1月.

The dataset contains discharge data for several years and I want to do a plot comparing f.ex. the month of january for the years 2016 and 2017.

到目前为止,我的尝试一直是提取所需的月份,然后将它们绘制在彼此的顶部.但这不起作用,如您在这张图片中所见:

My attempts thus far has been to extract the wanted months and just plotting them on top of each other. But this does not work as you can see in this picture:

尝试图1

我的代码是:

# Comparison full months
def plotmonthdischarge(month, years, number_of_years):
    df = pd.read_csv('resources\FinVannføringEidsfjordvatn.csv', encoding = 'ISO-8859-1',sep=';')
    df['date'] = pd.to_datetime(df['date'],dayfirst=True)
    df = df.set_index(df['date'])
    df['Day Of Year'] = df['date'].dt.dayofyear
    df = df.drop(['date'], axis = 1)
    df = df.replace(to_replace='-9999', value = np.NaN)


    fig, ax = plt.subplots()

    # For a starting year 2016 and a 1 following year
    # Call example:
    # plotmonthdischarge(1,[2016],2)
    if len(years) == 1:
        start_year = years[0]
        for i in range(number_of_years):
            year = df['{0}-{1}-01 00:00:00'.format(start_year+i,month):'{0}-{1}-31 23:59:59'.format(start_year+i,month)]
            ax.plot(year['discharge (m^3/s)'], label = 'Year {}'.format(start_year+i))

    # Just for plotting(ignore)
    formatted_list = ['{:>3}' for i in range(number_of_years)] 
    string_of_years = ', '.join(formatted_list).format(*[start_year+i for i in range(number_of_years)])
    plt.title('Comparison plot of years {}'.format(string_of_years))

    # Specific years  2006 and 2017
    # Call example:
    # plotmonthdischarge(1,[2006,2017],1)
    if len(years) > 1:
        number_of_years = 1
        for item in years:
            year = df['{0}-{1}-01 00:00:00'.format(item,month):'{0}-{1}-31 23:59:59'.format(item,month)]
            ax.plot(year['Day Of Year'],year['discharge (m^3/s)'], label = 'Year {}'.format(item))

    # Just for plotting(ignore)
    formatted_list = ['{:>3}' for item in years] 
    string_of_years = ', '.join(formatted_list).format(*years)
    plt.title('Comparison plot of years {}'.format(string_of_years))
    print(year)

    plt.suptitle(r'Discharge $m^{3}s^{-1}$')
    plt.ylabel(r'Discharge $m^{3}s^{-1}$')
    plt.legend()
    plt.grid(True)

plotmonthdischarge(1,[2015,2016],1)

我的下一次尝试是在其他帖子中找到的东西

My next attempt was with something I found in other posts

df['Day Of Year'] = df['date'].dt.dayofyear

然后绘制一个月中的所有天:

and then plotting over all the days in the month:

 ax.plot(year['Day Of Year'],year['discharge (m^3/s)'], label = 'Year {}'.format(item))

这项工作正常,但似乎每天只注册约一个点,这很糟糕,因为我使用的是每小时数据.

This worked okay except it seems like only one or so points per day gets registered which is bad since I'm working with hourly data.

尝试情节2

还尝试从日期时间(我的索引)中删除年份,并绘制仅包含月,日和小时的日期时间索引,但是没有真正的成功.

Also tried removing year from the datetime (my index) and plotting over a datetime index with only month, day and hours, but with no real success.

编辑:

单个年份(2015年1月)的样例示例图.

Example plot of how the plot of a single year (2015,january) would look like.

我只有一年的正确情节

推荐答案

如果您的数据没有缺失值(NaN),建议您使用.loc从DataFrame中切出所需的年份并绘制基础的numpy .values的数组:

If your data has no missing values (NaN), I'd suggest slicing desired years out of the DataFrame with .loc and plotting the underlying numpy arrays with .values:

fig, ax = plt.subplots()
for yr in ['2016', '2017']:
    ax.plot(df.loc[yr].values, label = 'Year {}'.format(yr))

一种更灵活的方法是手动计算年份的 hour 而不是年份的天,然后从那里进行计算:

A more flexible way is to manually compute the hour of the year, rather than the day of year, and go from there:

df['hourofyear'] = 24 * (df.index.dayofyear - 1) + df.index.hour
fig, ax = plt.subplots()
for yr, g in df.groupby(df.index.year):
    g.plot('hourofyear', 'discharge (m^3/s)', label='Year {}'.format(yr), ax=ax)

这篇关于 pandas 在一个图中比较每小时的多年数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆