如何使用fill_between按月创建最小-最大图 [英] How to create a min-max plot by month with fill_between
问题描述
我必须将月份名称显示为xticks,并在绘制图形并将x作为月份名称传递时将其绘制为错误.我还必须在折线图上叠加散点图.
我无法在此处粘贴完整代码,因为它是 MOOC 作业,我只是在寻找我在这里做错了什么.
plt.figure(figsize =(8,5))plt.plot(mint['Mean'],linewidth= 1, label = 'Minumum')plt.plot(maxt ['Mean'],linewidth = 1,label ='Maximum')plt.scatter(broken_low,mint15.iloc [broken_low] ['Mean'],alpha = 0.75)plt.scatter(broken_high,maxt15.iloc[broken_high]['Mean'],alpha = .75)
完整代码在这里:
应该看起来像这样,填充线之间的面积为x轴为月,y轴为摄氏度
使用 OP 中的数据更新
- 第一种方法的问题在于它要求 x 轴是日期时间格式.
- 正在对问题中的数据进行分组并针对一个字符串进行绘制,该字符串是月和日的组合
- x 轴代表 365 天,已删除闰年.
- 在每个月初的适当位置打勾
- 为勾号添加标签
将熊猫作为pd导入将matplotlib.pyplot导入为绘图导入日历#加载数据df = pd.read_csv('data/so_data/62929123/data.csv', parse_dates=['Date'])# 删除闰日df = df [〜((df.Date.dt.month == 2)&(df.Date.dt.day == 29))]# 添加年份列df['年'] = df.Date.dt.year# 添加一个用于 groupby 的月-日列df['Month-Day'] = df.Date.dt.month.astype('str') + '-' + df.Date.dt.day.astype('str')#选择2015年数据df_15 = df [df.Year == 2015] .reset_index()# 选择2015年之前的数据df_14 = df [df.Year<2015].reset_index()#将数据过滤为最大值或最小值并按月日分组max_14 = df_14[df_14.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})min_14 = df_14 [df_14.Element =='TMIN'].groupby(['Month-Day']).agg({'Data_Value':min}).reset_index().rename(columns = {'Data_Value':'Daily_Min'})max_15 = df_15[df_15.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'})min_15 = df_15[df_15.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Min'})# 选择 2015 年中大于记录最大值的最大值更高_14 = max_15[max_15 >最大_14]#从2015年选择小于记录的最小值的最小值lower_14 = min_15[min_15 <min_14]# 绘制最小和最大线ax = max_14.plot(label ='已记录的最大',color ='tab:red')min_14.plot(ax=ax, label='Min Recorded', color='tab:blue')#在最小和最大之间添加填充plt.fill_between(max_14.index, max_14.Daily_Max, min_14.Daily_Min, alpha=0.10, color='tab:orange')# 添加大于最大值或小于最小值的点plt.scatter(higher_14.index, Higher_14.Daily_Max, label='2015 Max > Record', alpha=0.75, color='tab:red')plt.scatter(lower_14.index,lower_14.Daily_Min, label='2015 Min
原答案
- 最初不清楚x轴值不是日期时间.
- 该数据集最初不可用.
- 可重复的数据并对其进行整形是此答案的底部,但这并不是在x轴上增加月份的必要条件
- 给定
max_15
和min_15
的数据帧,它们是 2015 年俄勒冈州波特兰的最低和最高温度.- 重要的细节是将
date
转换为带有pd.to_datetime
的日期时间格式,然后设置为索引. -
v
是一列浮点数 - 单独的
MIN
&使用可复制数据
- 这部分对于格式化x轴并不重要
- 这只是清理数据,以防有人想尝试
- 请参阅波特兰天气可视化,或: 1940 - 2020
将熊猫作为pd导入#将数据下载到数据帧中,格式广泛pdx_19 = pd.read_csv('http://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv',标头= 6)# 清理和标记数据pdx_19.drop(columns=['AVG or Total'], inplace=True)pdx_19.columns = list(pdx_19.columns [:3])+ [f'v_ {day}'在pdx_19.columns [3:]中的天pdx_19.rename(columns = {'Unnamed:2':'TYPE'},inplace = True)pdx_19 = pdx_19[pdx_19.TYPE.isin(['TX', 'TN', 'PR'])]# 转换为长格式pdx = pd.wide_to_long(pdx_19, stubnames='v', sep='_', i=['YR', 'MO', 'TYPE'], j='day').reset_index()# 额外的清洁pdx.TYPE = pdx.TYPE.map({'TX':'MAX','TN':'MIN','PR':'PRE'})pdx.rename(columns = {'YR':'year','MO':'month'},inplace = True)pdx = pdx [pdx.v!='-'].copy()pdx ['date'] = pd.to_datetime(pdx [['year','month','day']])pdx.drop(columns = ['year','month','day'],inplace = True)pdx.v.replace({'M':np.nan,'T':np.nan},inplace = True)pdx.v = pdx.v.astype('float')# 选择 2015pdx_2015 = pdx[pdx.date.dt.year == 2015].reset_index(drop=True).set_index('date')#仅选择最大温度max_15 = pdx_2015 [pdx_2015.TYPE =='MAX'].copy()# 只选择最低温度min_15 = pdx_2015 [pdx_2015.TYPE =='MIN'].copy()#计算滚动平均值max_15 ['rolling'] = max_15.v.rolling(7).mean()min_15 ['rolling'] = min_15.v.rolling(7).mean()
max_15
TYPE v滚动日期2015-01-01 MAX 39.0 NaN2015-01-02 MAX 41.0 NaN2015-01-03 MAX 41.0 NaN2015-01-04 MAX 53.0 NaN2015-01-05 MAX 57.0 NaN2015-01-06 MAX 47.0 NaN2015-01-07 最大 51.0 47.0000002015-01-08 最大 45.0 47.8571432015-01-09 MAX 50.0 49.1428572015-01-10 最大 42.0 49.285714
min_15
TYPE v 滚动日期2015-01-01 最小值 24.0 NaN2015-01-02 MIN 26.0 NaN2015-01-03 最小值 35.0 NaN2015-01-04 最小值 38.0 NaN2015-01-05 最小值 42.0 NaN2015-01-06 最小值 38.0 NaN2015-01-07 最低 34.0 33.8571432015-01-08 MIN 35.0 35.4285712015-01-09最低37.0 37.0000002015-01-10 分钟 36.0 37.142857
I have to show month names as xticks and while I plot the figure and pass x as month names it plots it wrong . I also have to overlay a scatter plot over the line graph.
I cannot paste the full code here as it is an MOOC assignment and I am just looking for what am I doing wrong here.
plt.figure(figsize=(8,5)) plt.plot(mint['Mean'],linewidth= 1, label = 'Minumum') plt.plot(maxt['Mean'],linewidth = 1, label = 'Maximum') plt.scatter(broken_low,mint15.iloc[broken_low]['Mean'],alpha = 0.75) plt.scatter(broken_high,maxt15.iloc[broken_high]['Mean'],alpha = .75)
Full Code Here: https://pastebin.com/N5PypMFH
Dataset link here : https://drive.google.com/file/d/1qJnnHDK_0ghmHQl4OuyKDr-0K5ETo7Td/view?usp=sharing
It should look like this with area between the lines filled and x axis as months and y axis as degree Celsius
解决方案Update Using Data from OP
- The issue with the first method is that it requires the x-axis to be a datetime format.
- The data in the question is being grouped and plotted against a string, which is a combination of the month and day
- The x-axis represents 365 days, leap years have been removed.
- Place ticks at the appropriate location for the beginning of each month
- Add a label to the tick
import pandas as pd import matplotlib.pyplot as plot import calendar # load the data df = pd.read_csv('data/so_data/62929123/data.csv', parse_dates=['Date']) # remove leap day df = df[~((df.Date.dt.month == 2) & (df.Date.dt.day == 29))] # add a year column df['Year'] = df.Date.dt.year # add a month-day column to use for groupby df['Month-Day'] = df.Date.dt.month.astype('str') + '-' + df.Date.dt.day.astype('str') # select 2015 data df_15 = df[df.Year == 2015].reset_index() # select data before 2015 df_14 = df[df.Year < 2015].reset_index() # filter data to either max or min and groupby month-day max_14 = df_14[df_14.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'}) min_14 = df_14[df_14.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': min}).reset_index().rename(columns={'Data_Value': 'Daily_Min'}) max_15 = df_15[df_15.Element == 'TMAX'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Max'}) min_15 = df_15[df_15.Element == 'TMIN'].groupby(['Month-Day']).agg({'Data_Value': max}).reset_index().rename(columns={'Data_Value': 'Daily_Min'}) # select max values from 2015 that are greater than the recorded max higher_14 = max_15[max_15 > max_14] # select min values from 2015 that are less than the recorded min lower_14 = min_15[min_15 < min_14] # plot the min and max lines ax = max_14.plot(label='Max Recorded', color='tab:red') min_14.plot(ax=ax, label='Min Recorded', color='tab:blue') # add the fill, between min and max plt.fill_between(max_14.index, max_14.Daily_Max, min_14.Daily_Min, alpha=0.10, color='tab:orange') # add points greater than max or less than min plt.scatter(higher_14.index, higher_14.Daily_Max, label='2015 Max > Record', alpha=0.75, color='tab:red') plt.scatter(lower_14.index, lower_14.Daily_Min, label='2015 Min < Record', alpha=0.75, color='tab:blue') # set plot xlim plt.xlim(-5, 370) # tick locations ticks=[-5, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365, 370] # tick labels labels = list(calendar.month_abbr) # list of months labels.extend(['Jan', '']) # add the custom ticks and labels plt.xticks(ticks=ticks, labels=labels) # plot cosmetics plt.legend() plt.xlabel('Day of Year: 0-365 Displaying Start of Month') plt.ylabel('Temperature °C') plt.title('Daily Max and Min: 2009 - 2014\nRecorded Max < 2015 Temperatures < Recorded Min') plt.tight_layout() plt.show()
Original Answer
- It was not originally clear that the x-axis values were not datetimes.
- The dataset was not originally available.
- The reproducible data and shaping it, is at the bottom of this answer, but it's not integral to adding months to the x-axis
- Given the dataframes of
max_15
andmin_15
, which are the minimum and maximum temperatures for Portland, OR in 2015.- The important detail is that
date
was converted to a datetime format withpd.to_datetime
and then set as the index. v
is a column of floats- Separate
MIN
&MAX
values into separate dataframes with Pandas: Boolean Indexing, which is also shown below in the data cleaning.
- The important detail is that
- Reference Matplotlib: Date tick labels & Formatting date ticks using ConciseDateFormatter
import pandas as pd import matplotlib.pyplot as plt import matplotlib.dates as mdates # plot styling parameters plt.style.use('seaborn') plt.rcParams['figure.figsize'] = (16.0, 10.0) plt.rcParams["patch.force_edgecolor"] = True # locate the Month and format the label months = mdates.MonthLocator() # every month months_fmt = mdates.DateFormatter('%b') # plot the data fig, ax = plt.subplots() ax.plot(max_15.index, 'rolling', data=max_15, label='max rolling mean') ax.scatter(x=max_15.index, y='v', data=max_15, alpha=0.75, label='MAX') ax.plot(min_15.index, 'rolling', data=min_15, label='min rolling mean') ax.scatter(x=min_15.index, y='v', data=min_15, alpha=0.75, label='MIN') ax.legend() # format the ticks ax.xaxis.set_major_locator(months) ax.xaxis.set_major_formatter(months_fmt)
Reproducible Data
- This part isn't important to formatting the x-axis
- This is just cleaning the data incase anyone wants to experiment
- See Weather Visualization for Portland, OR: 1940 - 2020
import pandas as pd # download data into dataframe, it's in a wide format pdx_19 = pd.read_csv('http://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv', header=6) # clean and label data pdx_19.drop(columns=['AVG or Total'], inplace=True) pdx_19.columns = list(pdx_19.columns[:3]) + [f'v_{day}' for day in pdx_19.columns[3:]] pdx_19.rename(columns={'Unnamed: 2': 'TYPE'}, inplace=True) pdx_19 = pdx_19[pdx_19.TYPE.isin(['TX', 'TN', 'PR'])] # convert to long format pdx = pd.wide_to_long(pdx_19, stubnames='v', sep='_', i=['YR', 'MO', 'TYPE'], j='day').reset_index() # additional cleaning pdx.TYPE = pdx.TYPE.map({'TX': 'MAX', 'TN': 'MIN', 'PR': 'PRE'}) pdx.rename(columns={'YR': 'year', 'MO': 'month'}, inplace=True) pdx = pdx[pdx.v != '-'].copy() pdx['date'] = pd.to_datetime(pdx[['year', 'month', 'day']]) pdx.drop(columns=['year', 'month', 'day'], inplace=True) pdx.v.replace({'M': np.nan, 'T': np.nan}, inplace=True) pdx.v = pdx.v.astype('float') # select on 2015 pdx_2015 = pdx[pdx.date.dt.year == 2015].reset_index(drop=True).set_index('date') # select only MAX temps max_15 = pdx_2015[pdx_2015.TYPE == 'MAX'].copy() # select only MIN temps min_15 = pdx_2015[pdx_2015.TYPE == 'MIN'].copy() # calculate rolling mean max_15['rolling'] = max_15.v.rolling(7).mean() min_15['rolling'] = min_15.v.rolling(7).mean()
max_15
TYPE v rolling date 2015-01-01 MAX 39.0 NaN 2015-01-02 MAX 41.0 NaN 2015-01-03 MAX 41.0 NaN 2015-01-04 MAX 53.0 NaN 2015-01-05 MAX 57.0 NaN 2015-01-06 MAX 47.0 NaN 2015-01-07 MAX 51.0 47.000000 2015-01-08 MAX 45.0 47.857143 2015-01-09 MAX 50.0 49.142857 2015-01-10 MAX 42.0 49.285714
min_15
TYPE v rolling date 2015-01-01 MIN 24.0 NaN 2015-01-02 MIN 26.0 NaN 2015-01-03 MIN 35.0 NaN 2015-01-04 MIN 38.0 NaN 2015-01-05 MIN 42.0 NaN 2015-01-06 MIN 38.0 NaN 2015-01-07 MIN 34.0 33.857143 2015-01-08 MIN 35.0 35.428571 2015-01-09 MIN 37.0 37.000000 2015-01-10 MIN 36.0 37.142857
这篇关于如何使用fill_between按月创建最小-最大图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 重要的细节是将