如何使用fill_between按月创建最小-最大图 [英] How to create a min-max plot by month with fill_between

查看:36
本文介绍了如何使用fill_between按月创建最小-最大图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将月份名称显示为xticks,并在绘制图形并将x作为月份名称传递时将其绘制为错误.我还必须在折线图上叠加散点图.

我无法在此处粘贴完整代码,因为它是 MOOC 作业,我只是在寻找我在这里做错了什么.

  plt.figure(figsize =(8,5))plt.plot(mint['Mean'],linewidth= 1, label = 'Minumum')plt.plot(maxt ['Mean'],linewidth = 1,label ='Maximum')plt.scatter(broken_low,mint15.iloc [broken_low] ['Mean'],alpha = 0.75)plt.scatter(broken_high,maxt15.iloc[broken_high]['Mean'],alpha = .75)

完整代码在这里:

应该看起来像这样,填充线之间的面积为x轴为月,y轴为摄氏度

解决方案

使用 OP 中的数据更新

  • 第一种方法的问题在于它要求 x 轴是日期时间格式.
  • 正在对问题中的数据进行分组并针对一个字符串进行绘制,该字符串是月和日的组合
  • x 轴代表 365 天,已删除闰年.
    • 在每个月初的适当位置打勾
    • 为勾号添加标签

 将熊猫作为pd导入将matplotlib.pyplot导入为绘图导入日历#加载数据df = pd.read_csv('data/so_data/62929123/data.csv', parse_dates=['Date'])# 删除闰日df = df [〜((df.Date.dt.month == 2)&(df.Date.dt.day == 29))]# 添加年份列df['年'] = df.Date.dt.year# 添加一个用于 groupby 的月-日列df['Month-Day'] = df.Date.dt.month.astype('str') + '-' + df.Date.dt.day.astype('str')#选择2015年数据df_15 = df [df.Year == 2015] .reset_index()# 选择2015年之前的数据df_14 = df [df.Year<2015].reset_index()#将数据过滤为最大值或最小值并按月日分组max_14 = df_14[df_14.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})min_14 = df_14 [df_14.Element =='TMIN'].groupby(['Month-Day']).agg({'Data_Value':min}).reset_index().rename(columns = {'Data_Value':'Daily_Min'})max_15 = df_15[df_15.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})min_15 = df_15[df_15.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Min'})# 选择 2015 年中大于记录最大值的最大值更高_14 = max_15[max_15 >最大_14]#从2015年选择小于记录的最小值的最小值lower_14 = min_15[min_15 <min_14]# 绘制最小和最大线ax = max_14.plot(label ='已记录的最大',color ='tab:red')min_14.plot(ax=ax, label='Min Recorded', color='tab:blue')#在最小和最大之间添加填充plt.fill_between(max_14.index, max_14.Daily_Max, min_14.Daily_Min, alpha=0.10, color='tab:orange')# 添加大于最大值或小于最小值的点plt.scatter(higher_14.index, Higher_14.Daily_Max, label='2015 Max > Record', alpha=0.75, color='tab:red')plt.scatter(lower_14.index,lower_14.Daily_Min, label='2015 Min 

原答案

  • 最初不清楚x轴值不是日期时间.
    • 该数据集最初不可用.
  • 可重复的数据并对其进行整形是此答案的底部,但这并不是在x轴上增加月份的必要条件
  • 给定 max_15min_15 的数据帧,它们是 2015 年俄勒冈州波特兰的最低和最高温度.
    • 重要的细节是将date 转换为带有pd.to_datetime 的日期时间格式,然后设置为索引.
    • v 是一列浮点数
    • 单独的 MIN &使用

      可复制数据

       将熊猫作为pd导入#将数据下载到数据帧中,格式广泛pdx_19 = pd.read_csv('http://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv',标头= 6)# 清理和标记数据pdx_19.drop(columns=['AVG or Total'], inplace=True)pdx_19.columns = list(pdx_19.columns [:3])+ [f'v_ {day}'在pdx_19.columns [3:]中的天pdx_19.rename(columns = {'Unnamed:2':'TYPE'},inplace = True)pdx_19 = pdx_19[pdx_19.TYPE.isin(['TX', 'TN', 'PR'])]# 转换为长格式pdx = pd.wide_to_long(pdx_19, stubnames='v', sep='_', i=['YR', 'MO', 'TYPE'], j='day').reset_index()# 额外的清洁pdx.TYPE = pdx.TYPE.map({'TX':'MAX','TN':'MIN','PR':'PRE'})pdx.rename(columns = {'YR':'year','MO':'month'},inplace = True)pdx = pdx [pdx.v!='-'].copy()pdx ['date'] = pd.to_datetime(pdx [['year','month','day']])pdx.drop(columns = ['year','month','day'],inplace = True)pdx.v.replace({'M':np.nan,'T':np.nan},inplace = True)pdx.v = pdx.v.astype('float')# 选择 2015pdx_2015 = pdx[pdx.date.dt.year == 2015].reset_index(drop=True).set_index('date')#仅选择最大温度max_15 = pdx_2015 [pdx_2015.TYPE =='MAX'].copy()# 只选择最低温度min_15 = pdx_2015 [pdx_2015.TYPE =='MIN'].copy()#计算滚动平均值max_15 ['rolling'] = max_15.v.rolling(7).mean()min_15 ['rolling'] = min_15.v.rolling(7).mean()

      max_15

        TYPE v滚动日期2015-01-01 MAX 39.0 NaN2015-01-02 MAX 41.0 NaN2015-01-03 MAX 41.0 NaN2015-01-04 MAX 53.0 NaN2015-01-05 MAX 57.0 NaN2015-01-06 MAX 47.0 NaN2015-01-07 最大 51.0 47.0000002015-01-08 最大 45.0 47.8571432015-01-09 MAX 50.0 49.1428572015-01-10 最大 42.0 49.285714

      min_15

       TYPE v 滚动日期2015-01-01 最小值 24.0 NaN2015-01-02 MIN 26.0 NaN2015-01-03 最小值 35.0 NaN2015-01-04 最小值 38.0 NaN2015-01-05 最小值 42.0 NaN2015-01-06 最小值 38.0 NaN2015-01-07 最低 34.0 33.8571432015-01-08 MIN 35.0 35.4285712015-01-09最低37.0 37.0000002015-01-10 分钟 36.0 37.142857

      I have to show month names as xticks and while I plot the figure and pass x as month names it plots it wrong . I also have to overlay a scatter plot over the line graph.

      I cannot paste the full code here as it is an MOOC assignment and I am just looking for what am I doing wrong here.

      plt.figure(figsize=(8,5))
      
      plt.plot(mint['Mean'],linewidth= 1, label = 'Minumum')
      plt.plot(maxt['Mean'],linewidth = 1, label = 'Maximum')
      
      plt.scatter(broken_low,mint15.iloc[broken_low]['Mean'],alpha = 0.75)
      plt.scatter(broken_high,maxt15.iloc[broken_high]['Mean'],alpha = .75)
      

      Full Code Here: https://pastebin.com/N5PypMFH

      Dataset link here : https://drive.google.com/file/d/1qJnnHDK_0ghmHQl4OuyKDr-0K5ETo7Td/view?usp=sharing

      It should look like this with area between the lines filled and x axis as months and y axis as degree Celsius

      解决方案

      Update Using Data from OP

      • The issue with the first method is that it requires the x-axis to be a datetime format.
      • The data in the question is being grouped and plotted against a string, which is a combination of the month and day
      • The x-axis represents 365 days, leap years have been removed.
        • Place ticks at the appropriate location for the beginning of each month
        • Add a label to the tick

      import pandas as pd
      import matplotlib.pyplot as plot
      import calendar
      
      # load the data
      df = pd.read_csv('data/so_data/62929123/data.csv', parse_dates=['Date'])
      
      # remove leap day
      df = df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))]
      
      # add a year column
      df['Year'] = df.Date.dt.year
      
      # add a month-day column to use for groupby
      df['Month-Day'] = df.Date.dt.month.astype('str') + '-' + df.Date.dt.day.astype('str')
      
      # select 2015 data
      df_15 = df[df.Year == 2015].reset_index()
      
      # select data before 2015
      df_14 = df[df.Year < 2015].reset_index()
      
      # filter data to either max or min and groupby month-day
      max_14 = df_14[df_14.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})
      min_14 = df_14[df_14.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': min}).reset_index().rename(columns={'Data_Value': 'Daily_Min'})
      max_15 = df_15[df_15.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})
      min_15 = df_15[df_15.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Min'})
      
      # select max values from 2015 that are greater than the recorded max
      higher_14 = max_15[max_15 > max_14]
      
      # select min values from 2015 that are less than the recorded min
      lower_14 = min_15[min_15 < min_14]
      
      # plot the min and max lines
      ax = max_14.plot(label='Max Recorded', color='tab:red')
      min_14.plot(ax=ax, label='Min Recorded', color='tab:blue')
      
      # add the fill, between min and max
      plt.fill_between(max_14.index, max_14.Daily_Max, min_14.Daily_Min, alpha=0.10, color='tab:orange')
      
      # add points greater than max or less than min
      plt.scatter(higher_14.index, higher_14.Daily_Max, label='2015 Max > Record', alpha=0.75, color='tab:red')
      plt.scatter(lower_14.index, lower_14.Daily_Min, label='2015 Min < Record', alpha=0.75, color='tab:blue')
      
      # set plot xlim
      plt.xlim(-5, 370)
      
      # tick locations
      ticks=[-5, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 370]
      
      # tick labels
      labels = list(calendar.month_abbr)  # list of months
      labels.extend(['Jan', ''])
      
      # add the custom ticks and labels
      plt.xticks(ticks=ticks, labels=labels)
      
      # plot cosmetics
      plt.legend()
      plt.xlabel('Day of Year: 0-365 Displaying Start of Month')
      plt.ylabel('Temperature °C')
      plt.title('Daily Max and Min: 2009 - 2014\nRecorded Max < 2015 Temperatures < Recorded Min')
      plt.tight_layout()
      plt.show()
      

      Original Answer

      import pandas as pd
      import matplotlib.pyplot as plt
      import matplotlib.dates as mdates
      
      # plot styling parameters
      plt.style.use('seaborn')
      plt.rcParams['figure.figsize'] = (16.0, 10.0)
      plt.rcParams["patch.force_edgecolor"] = True
      
      # locate the Month and format the label
      months = mdates.MonthLocator()  # every month
      months_fmt = mdates.DateFormatter('%b')
      
      # plot the data
      fig, ax = plt.subplots()
      ax.plot(max_15.index, 'rolling', data=max_15, label='max rolling mean')
      ax.scatter(x=max_15.index, y='v', data=max_15, alpha=0.75, label='MAX')
      
      ax.plot(min_15.index, 'rolling', data=min_15, label='min rolling mean')
      ax.scatter(x=min_15.index, y='v', data=min_15, alpha=0.75, label='MIN')
      ax.legend()
      
      # format the ticks
      ax.xaxis.set_major_locator(months)
      ax.xaxis.set_major_formatter(months_fmt)
      

      Reproducible Data

      import pandas as pd
      
      # download data into dataframe, it's in a wide format
      pdx_19 = pd.read_csv('http://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv', header=6)
      
      # clean and label data
      pdx_19.drop(columns=['AVG or Total'], inplace=True)
      pdx_19.columns = list(pdx_19.columns[:3]) + [f'v_{day}' for day in pdx_19.columns[3:]]
      pdx_19.rename(columns={'Unnamed: 2': 'TYPE'}, inplace=True)
      pdx_19 = pdx_19[pdx_19.TYPE.isin(['TX', 'TN', 'PR'])]
      
      # convert to long format
      pdx = pd.wide_to_long(pdx_19, stubnames='v', sep='_', i=['YR', 'MO', 'TYPE'], j='day').reset_index()
      
      # additional cleaning
      pdx.TYPE = pdx.TYPE.map({'TX': 'MAX', 'TN': 'MIN', 'PR': 'PRE'})
      pdx.rename(columns={'YR': 'year', 'MO': 'month'}, inplace=True)
      pdx = pdx[pdx.v != '-'].copy()
      pdx['date'] = pd.to_datetime(pdx[['year', 'month', 'day']])
      pdx.drop(columns=['year', 'month', 'day'], inplace=True)
      pdx.v.replace({'M': np.nan, 'T': np.nan}, inplace=True)
      pdx.v = pdx.v.astype('float')
      
      # select on 2015
      pdx_2015 = pdx[pdx.date.dt.year == 2015].reset_index(drop=True).set_index('date')
      
      # select only MAX temps
      max_15 = pdx_2015[pdx_2015.TYPE == 'MAX'].copy()
      
      # select only MIN temps
      min_15 = pdx_2015[pdx_2015.TYPE == 'MIN'].copy()
      
      # calculate rolling mean
      max_15['rolling'] = max_15.v.rolling(7).mean()
      min_15['rolling'] = min_15.v.rolling(7).mean()
      

      max_15

                 TYPE     v    rolling
      date                            
      2015-01-01  MAX  39.0        NaN
      2015-01-02  MAX  41.0        NaN
      2015-01-03  MAX  41.0        NaN
      2015-01-04  MAX  53.0        NaN
      2015-01-05  MAX  57.0        NaN
      2015-01-06  MAX  47.0        NaN
      2015-01-07  MAX  51.0  47.000000
      2015-01-08  MAX  45.0  47.857143
      2015-01-09  MAX  50.0  49.142857
      2015-01-10  MAX  42.0  49.285714
      

      min_15

                 TYPE     v    rolling
      date                            
      2015-01-01  MIN  24.0        NaN
      2015-01-02  MIN  26.0        NaN
      2015-01-03  MIN  35.0        NaN
      2015-01-04  MIN  38.0        NaN
      2015-01-05  MIN  42.0        NaN
      2015-01-06  MIN  38.0        NaN
      2015-01-07  MIN  34.0  33.857143
      2015-01-08  MIN  35.0  35.428571
      2015-01-09  MIN  37.0  37.000000
      2015-01-10  MIN  36.0  37.142857
      

      这篇关于如何使用fill_between按月创建最小-最大图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆