以小倍数突出显示周末 [英] Highlighting weekends in small multiples

查看:50
本文介绍了以小倍数突出显示周末的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以较小的倍数突出显示周末?

我阅读了不同的主题(例如

我的尝试:

 在[10]中:班级亮点周末:'''突出周末的班级对象'''def __init __(self,period):self.ranges = period.index.dayofweek> = 5self.res = [x代表x,(i,j)的枚举(zip([2] + list(self.ranges),list(self.ranges)+ [2]))如果i!= j]如果self.res [0] == 0和self.ranges [0] == False:del self.res [0]如果self.res [-1] == len(self.ranges)和self.ranges [-1] == False:del self.res [-1]months = Profiles.loc ['2018'].groupby(lambda x:x.month)无花果,axs = plt.subplots(4,3,figsize =(16,12),sharey = True)axs = axs.flatten()对于我,j在几个月内:axs [i-1] .plot(j.index,j)如果我<len(月):k = 0而k <len(highlightWeekend(j).res):axs [i-1] .axvspan(j.index [highlightWeekend(j).res [k]],j.index [highlightWeekend(j).res [k + 1]],alpha = .2)k + = 2i + = 1plt.show()[出10]: 

问题如何解决月底出现的周末的问题?

解决方案

TL; DR 跳至方法2的解决方案,以查看最佳解决方案,或跳至最后一个使用单个熊猫线图的解决方案示例.在所有三个示例中,周末仅使用4-6行代码突出显示,其余部分用于格式化和可再现性.



方法和工具

我知道有两种在时间序列图上突出显示周末的方法,可以通过循环遍历子图数组将其应用于单个图和较小倍数.该答案提供了突出显示周末的解决方案,但可以轻松调整它们以在任何重复的时间段内正常工作.


方法1:根据数据框索引突出显示

此方法遵循问题和链接线程中答案中代码的逻辑.不幸的是,当一个周末在月末发生时,就会出现问题,绘制整个周末所需的索引号超过了会产生错误的索引范围.在下面进一步显示的解决方案中,通过计算两个时间戳之间的时间差并将其循环显示以突出​​显示周末,将其添加到DatetimeIndex的每个时间戳中,可以解决此问题.

但是仍然存在两个问题,i)此方法不适用于频率超过一天的时间序列,并且ii)基于小于每小时(例如15分钟)的频率的时间序列将需要绘制许多多边形这会影响性能.由于这些原因,此处出于文档目的介绍了此方法,我建议改为使用方法2.


方法2:基于x轴单位突出显示

此方法使用x轴单位,即自时间起点(1970-01-01)以来的天数,独立于正在绘制的时间序列数据来标识周末,这使其比方法更灵活1.仅针对每个完整的周末绘制亮点,这比以下示例中的方法1快两倍(根据Jupyter Notebook中的 %% timeit 测试).这是我建议使用的方法.


matplotlib中的工具,可用于实现这两种方法

axvspan

如您所见,周末突出显示结束于数据结束的地方,如三月所示.如果使用DatetimeIndex设置x轴限制,这当然不会引起注意.



方法2的解决方案:基于x轴单位突出显示

此解决方案使用x轴限制以天为单位来计算图所涵盖的时间范围,这是用于

如您所见,无论数据在哪里开始和结束,周末始终始终被充分突出显示.



方法2的解决方案的其他示例,该解决方案具有每月时间序列和熊猫图

该图可能没有多大意义,但可以说明方法2的灵活性以及如何使其与熊猫线图兼容.请注意,样本数据集使用一个月开始频率,以便默认刻度线与数据点对齐.

 #创建具有一个月开始频率的样本数据集rng = np.random.default_rng(seed = 1)#随机数生成器dti = pd.date_range('2018-01-01 00:00','2018-06-30 23:59',freq ='MS')消耗= rng.integers(1000,2000,size = dti.size)df = pd.DataFrame(dict(consumption = consumption),index = dti)#绘制熊猫图:x_compat = True将熊猫x轴单位转换为matplotlib#个日期单位ax = df.plot(x_compat = True,figsize =(10,4),图例=无)ax.set_ylim(0,2500)#设置与所显示的图类似的限制,或使用下一行#ax.set_ylim(* ax.get_ylim())#根据x轴单位突出显示周末,与DatetimeIndex无关xmin,xmax = ax.get_xlim()天= np.arange(np.floor(xmin),np.ceil(xmax)+2)对于mdates.num2date(days)中的dt,周末= [[dt.weekday()> == 5)|(dt.weekday()== 0)]ax.fill_between(天,* ax.get_ylim(),其中=周末,facecolor ='k',alpha = .1)ax.set_xlim(xmin,xmax)#将限制设置回默认值#其他格式ax.figure.autofmt_xdate(rotation = 0,ha ='center')ax.set_title('2018按月消费'.upper(),pad = 15,fontsize = 12)ax.figure.text(0.5,1.05,'使用x轴单位突出显示周末,'ha ='center',fontsize = 14,weight ='semibold'); 



您可以在此处 Nipun Batra的答案 matplotlib.dates

How can I highlight weekends in a small multiples?

I've read different threads (e.g. (1) and (2)) but couldn't figure out how to implement it into my case, since I work with small multiples where I iterate through the DateTimeIndex to every month (see code below figure). My data Profiles is for this case a time-series of 2 years with an interval of 15min (i.e. 70080 datapoints).

However, weekend days occuring at the end of the month and therefore generate an error; in this case: IndexError: index 2972 is out of bounds for axis 0 with size 2972

My attempt: [Edited - with suggestions by @Patrick FitzGerald]

In [10]:
class highlightWeekend:
    '''Class object to highlight weekends'''
    def __init__(self, period):
        self.ranges= period.index.dayofweek >= 5
        self.res = [x for x, (i , j) in enumerate(zip( [2] + list(self.ranges), list(self.ranges) + [2])) if i != j]
        if self.res[0] == 0 and self.ranges[0] == False:
            del self.res[0]
        if self.res[-1] == len(self.ranges) and self.ranges[-1] == False:
            del self.res[-1]

months= Profiles.loc['2018'].groupby(lambda x: x.month)
fig, axs= plt.subplots(4,3, figsize= (16, 12), sharey=True)
axs= axs.flatten()
for i, j in months:
    axs[i-1].plot(j.index, j)
    if i < len(months):
        k= 0
        while k < len(highlightWeekend(j).res):
            axs[i-1].axvspan(j.index[highlightWeekend(j).res[k]], j.index[highlightWeekend(j).res[k+1]], alpha=.2)
            k+=2
    i+=1
plt.show()

[Out 10]:

Question How to solve the issue of the weekend day occuring at the end of the month ?

解决方案

TL;DR Skip to Solution for method 2 to see the optimal solution, or skip to the last example for a solution with a single pandas line plot. In all three examples, weekends are highlighted using just 4-6 lines of code, the rest is for formatting and reproducibility.



Methods and tools

I am aware of two methods to highlight weekends on plots of time series, which can be applied both to single plots and to small multiples by looping over the array of subplots. This answer presents solutions for highlighting weekends but they can be easily adjusted to work for any recurring period of time.


Method 1: highlight based on the dataframe index

This method follows the logic of the code in the question and in the answers in the linked threads. Unfortunately, a problem arises when a weekend day occurs at the end of the month, the index number that is needed to draw the full span of the weekend exceeds the index range which produces an error. This issue is solved in the solution shown further below by computing the time difference between two timestamps and adding it to each timestamp of the DatetimeIndex when looping over them to highlight the weekends.

But two issues remain, i) this method does not work for time series with a frequency of more than a day, and ii) time series based on frequencies less than hourly (like 15 minutes) will require the drawing of many polygons which hurts performance. For these reasons, this method is presented here for the purpose of documentation and I suggest using instead method 2.


Method 2: highlight based on the x-axis units

This method uses the x-axis units, that is the number of days since the time origin (1970-01-01), to identify the weekends independently from the time series data being plotted which makes it much more flexible than method 1. The highlights are drawn for each full weekend day only, making this two times faster than method 1 for the examples presented below (according to a %%timeit test in Jupyter Notebook). This is the method I recommend using.


Tools in matplotlib that can be used to implement both methods

axvspan link demo, link API (used in Solution for method 1)

broken_barh link demo, link API

fill_between link demo, link API (used in Solution for method 2)

BrokenBarHCollection.span_where link demo, link API

To me, it seems that fill_between and BrokenBarHCollection.span_where are essentially the same. Both provide the handy where argument which is used in the solution for method 2 presented further below.



Solutions

Here is a reproducible sample dataset used to illustrate both methods, using a frequency of 6 hours. Note that the dataframe contains data only for one year which makes it possible to select the monthly data simply with df[df.index.month == month] to draw each subplot. You will need to adjust this if you are dealing with a multi-year DatetimeIndex.

Import packages used for all 3 examples and create the dataset for the first 2 examples

import numpy as np                   # v 1.19.2
import pandas as pd                  # v 1.1.3
import matplotlib.pyplot as plt      # v 3.3.2
import matplotlib.dates as mdates  # used only for method 2

# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('2018-01-01 00:00', '2018-12-31 23:59', freq='6H')
consumption = rng.integers(1000, 2000, size=dti.size)
df = pd.DataFrame(dict(consumption=consumption), index=dti)

Solution for method 1: highlight based on the dataframe index

In this solution, the weekends are highlighted using axvspan and the DatetimeIndex of the monthly dataframes df_month. The weekend timestamps are selected with df_month.index[df_month.weekday>=5].to_series() and the problem of exceeding the index range is solved by computing the timedelta from the frequency of the DatetimeIndex and adding it to each timestamp.

Of course, axvspan could also be used in a smarter way than shown here so that each weekend highlight is drawn in a single go, but I believe this would result in a less flexible solution and more code than what is presented in Solution for method 2.

# Draw and format subplots by looping through months and flattened array of axes
fig, axs = plt.subplots(4, 3, figsize=(10, 9), sharey=True)
for month, ax in zip(df.index.month.unique(), axs.flat):
    # Select monthly data and plot it
    df_month = df[df.index.month == month]
    ax.plot(df_month.index, df_month['consumption'])
    ax.set_ylim(0, 2500) # set limit similar to plot shown in question
    
    # Draw vertical spans for weekends: computing the timedelta and adding it
    # to the date solves the problem of exceeding the df_month.index
    timedelta = pd.to_timedelta(df_month.index.freq)
    weekends = df_month.index[df_month.index.weekday>=5].to_series()
    for date in weekends:
        ax.axvspan(date, date+timedelta, facecolor='k', edgecolor=None, alpha=.1)
    
    # Format tick labels
    ax.set_xticks(ax.get_xticks())
    tk_labels = [pd.to_datetime(tk, unit='D').strftime('%d') for tk in ax.get_xticks()]
    ax.set_xticklabels(tk_labels, rotation=0, ha='center')
    
    # Add x labels for months
    ax.set_xlabel(df_month.index[0].month_name().upper(), labelpad=5)
    ax.xaxis.set_label_position('top')

# Add title and edit spaces between subplots
year = df.index[0].year
freq = df_month.index.freqstr
title = f'{year} consumption displayed for each month with a {freq} frequency'
fig.suptitle(title.upper(), y=0.95, fontsize=12)
fig.subplots_adjust(wspace=0.1, hspace=0.5)

fig.text(0.5, 0.99, 'Weekends are highlighted by using the DatetimeIndex',
         ha='center', fontsize=14, weight='semibold');

As you can see, the weekend highlights end where the data ends as illustrated with the month of March. This is of course not noticeable if the DatetimeIndex is used to set the x-axis limits.



Solution for method 2: highlight based on the x-axis units

This solution uses the x-axis limits to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. A weekends mask is computed and then passed to the where argument of the fill_between plotting function. The True values of the mask are processed as right-exclusive so in this case, Mondays must be included for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.

Note that with fill_between the y1 and y2 arguments must be given. For some reason using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. Here, the y limits are set to 0 and 2500 just to create an example similar to the one in the question but the following should be used instead for general cases: ax.set_ylim(*ax.get_ylim()).

# Draw and format subplots by looping through months and flattened array of axes
fig, axs = plt.subplots(4, 3, figsize=(10, 9), sharey=True)
for month, ax in zip(df.index.month.unique(), axs.flat):
    # Select monthly data and plot it
    df_month = df[df.index.month == month]
    ax.plot(df_month.index, df_month['consumption'])
    ax.set_ylim(0, 2500) # set limit like plot shown in question, or use next line
#     ax.set_ylim(*ax.get_ylim())
    
    # Highlight weekends based on the x-axis units, regardless of the DatetimeIndex
    xmin, xmax = ax.get_xlim()
    days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
    weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
    ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
    ax.set_xlim(xmin, xmax) # set limits back to default values
     
    # Create appropriate ticks with matplotlib date tick locator and formatter
    tick_loc = mdates.MonthLocator(bymonthday=np.arange(1, 31, step=5))
    ax.xaxis.set_major_locator(tick_loc)
    tick_fmt = mdates.DateFormatter('%d')
    ax.xaxis.set_major_formatter(tick_fmt)
    
    # Add x labels for months
    ax.set_xlabel(df_month.index[0].month_name().upper(), labelpad=5)
    ax.xaxis.set_label_position('top')

# Add title and edit spaces between subplots
year = df.index[0].year
freq = df_month.index.freqstr
title = f'{year} consumption displayed for each month with a {freq} frequency'
fig.suptitle(title.upper(), y=0.95, fontsize=12)
fig.subplots_adjust(wspace=0.1, hspace=0.5)
fig.text(0.5, 0.99, 'Weekends are highlighted by using the x-axis units',
         ha='center', fontsize=14, weight='semibold');

As you can see, the weekends are always highlighted to the full extent, regardless of where the data starts and ends.



Additional example of a solution for method 2 with a monthly time series and a pandas plot

This plot may not make much sense but it serves to illustrate the flexibility of method 2 and how to make it compatible with a pandas line plot. Note that the sample dataset uses a month start frequency so that the default ticks are aligned with the data points.

# Create sample dataset with a month start frequency
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('2018-01-01 00:00', '2018-06-30 23:59', freq='MS')
consumption = rng.integers(1000, 2000, size=dti.size)
df = pd.DataFrame(dict(consumption=consumption), index=dti)

# Draw pandas plot: x_compat=True converts the pandas x-axis units to matplotlib
# date units
ax = df.plot(x_compat=True, figsize=(10, 4), legend=None)
ax.set_ylim(0, 2500) # set limit similar to plot shown in question, or use next line
# ax.set_ylim(*ax.get_ylim())
    
# Highlight weekends based on the x-axis units, regardless of the DatetimeIndex
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values

# Additional formatting
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('2018 consumption by month'.upper(), pad=15, fontsize=12)

ax.figure.text(0.5, 1.05, 'Weekends are highlighted by using the x-axis units',
               ha='center', fontsize=14, weight='semibold');



You can find more examples of this solution in the answers I have posted here and here. References: this answer by Nipun Batra, this answer by BenB, matplotlib.dates

这篇关于以小倍数突出显示周末的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆