是否可以在给定百分比值而不是原始输入的情况下绘制 matplotlib 箱线图? [英] Is it possible to draw a matplotlib boxplot given the percentile values instead of the original inputs?

查看:30
本文介绍了是否可以在给定百分比值而不是原始输入的情况下绘制 matplotlib 箱线图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,boxplot() 方法需要一系列原始值(数字)作为输入,然后计算百分位数以绘制箱线图.

我想要一种方法,通过它我可以传入百分位数并获得相应的boxplot.

例如:

假设我已经运行了多个基准测试,并且对于每个基准测试,我都测量了延迟(浮点值).另外,我已经预先计算了这些值的百分位数.

因此,对于每个基准,我都有第 25、第 50、第 75 个百分点以及最小值和最大值.

现在有了这些数据,我想为基准绘制箱线图.

解决方案

为了仅使用百分位值和异常值(如果有)绘制箱线图,我创建了一个 customized_box_plot 函数,该函数基本上修改了属性在一个基本的箱线图中(从一个很小的样本数据生成),使其适合您的百分位值.

customized_box_plot 函数

def Customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):"根据给定的百分位值生成自定义箱线图"box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs)# 创建 len(percentiles) 箱线图min_y, max_y = float('inf'), -float('inf')对于 box_no, (q1_start,q2_start,q3_start,q4_start,q4_end,fliers_xy) 在 enumerate(percentiles) 中:# 下盖box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])# xdata 由箱线图的宽度决定# 下胡须box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])# 更高的上限box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])# 更高的胡须box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])# 盒子box_plot['boxes'][box_no].set_ydata([q2_start,q2_start,q4_start,q4_start,q2_start])# 中位数box_plot['中位数'][box_no].set_ydata([q3_start, q3_start])# 异常值如果 flyers_xy 不是 None 并且 len(fliers_xy[0]) != 0:# 如果存在异常值box_plot['fliers'][box_no].set(xdata = flyers_xy[0],ydata = 传单_xy[1])min_y = min(q1_start, min_y, fliers_xy[1].min())max_y = max(q4_end, max_y, fliers_xy[1].max())别的:min_y = min(q1_start, min_y)max_y = max(q4_end, max_y)# y 轴被重新调整以完全适应新的箱线图 10%# 两端的最大值axis.set_ylim([min_y*1.1, max_y*1.1])# 如果 redraw 设置为 true,则更新画布.如果重绘:ax.figure.canvas.draw()返回 box_plot

用法

使用逆逻辑(最后的代码)我从这个 对象有两种方法,我将在我的函数中广泛使用它们.set_xdata(或 set_ydata ) 和 get_xdata (或 get_ydata ).

使用这些方法,我们可以改变基本箱线图的组成线的位置,以符合您的百分位值(这是 customized_box_plot 函数所做的).改变组成线的位置后,您可以使用 figure.canvas.draw()

重新绘制画布

总结从百分位到各种 Line2D 对象坐标的映射.

Y 坐标:

  • 最大值(q4_end - 第四四分位数的结尾)对应于最顶部的 Line2D 对象.
  • 最小值(q1_start - 第一个四分位数的开始)对应于最低的上限 Line2D 对象.
  • 中位数对应于 ( q3_start ) 中位数 Line2D 对象.
  • 两根须位于盒子的末端和大写之间( q1_startq2_start - 下须;q4_startq4_end - 上须)
  • 盒子实际上是一条有趣的 n 形状的线,在下部由一个盖子包围.n 形线的两端对应于 q2_startq4_start.

X 坐标:

  • 中心 x 坐标(对于多个箱线图通常为 1、2、3...)
  • 库会根据指定的宽度自动计算边界 x 坐标.

从箱线图 DICT 中检索百分位数的反函数:

def get_percentiles_from_box_plots(bp):百分位数 = []对于范围内的 i(len(bp['boxes'])):percentiles.append((bp['caps'][2*i].get_ydata()[0],bp['boxes'][i].get_ydata()[0],bp['中位数'][i].get_ydata()[0],bp['boxes'][i].get_ydata()[2],bp['caps'][2*i + 1].get_ydata()[0],(bp['传单'][i].get_xdata(),bp['传单'][i].get_ydata())))返回百分位数

注意:我之所以没有制作完全自定义的箱线图方法,是因为内置箱线图提供的许多功能无法完全重现.

如果我不必要地解释了一些可能太明显的东西,请原谅.

From what I can see, boxplot() method expects a sequence of raw values (numbers) as input, from which it then computes percentiles to draw the boxplot(s).

I would like to have a method by which I could pass in the percentiles and get the corresponding boxplot.

For example:

Assume that I have run several benchmarks and for each benchmark I've measured latencies ( floating point values ). Now additionally, I have precomputed the percentiles for these values.

Hence for each benchmark, I have the 25th, 50th, 75th percentile along with the min and max.

Now given these data, I would like to draw the box plots for the benchmarks.

解决方案

To draw the box plot using just the percentile values and the outliers ( if any ) I made a customized_box_plot function that basically modifies attributes in a basic box plot ( generated from a tiny sample data ) to make it fit according to your percentile values.

The customized_box_plot function

def customized_box_plot(percentiles, axes, redraw = True, *args, **kwargs):
    """
    Generates a customized boxplot based on the given percentile values
    """
    
    box_plot = axes.boxplot([[-9, -4, 2, 4, 9],]*n_box, *args, **kwargs) 
    # Creates len(percentiles) no of box plots
    
    min_y, max_y = float('inf'), -float('inf')
    
    for box_no, (q1_start, 
                 q2_start,
                 q3_start,
                 q4_start,
                 q4_end,
                 fliers_xy) in enumerate(percentiles):
        
        # Lower cap
        box_plot['caps'][2*box_no].set_ydata([q1_start, q1_start])
        # xdata is determined by the width of the box plot

        # Lower whiskers
        box_plot['whiskers'][2*box_no].set_ydata([q1_start, q2_start])

        # Higher cap
        box_plot['caps'][2*box_no + 1].set_ydata([q4_end, q4_end])

        # Higher whiskers
        box_plot['whiskers'][2*box_no + 1].set_ydata([q4_start, q4_end])

        # Box
        box_plot['boxes'][box_no].set_ydata([q2_start, 
                                             q2_start, 
                                             q4_start,
                                             q4_start,
                                             q2_start])
        
        # Median
        box_plot['medians'][box_no].set_ydata([q3_start, q3_start])

        # Outliers
        if fliers_xy is not None and len(fliers_xy[0]) != 0:
            # If outliers exist
            box_plot['fliers'][box_no].set(xdata = fliers_xy[0],
                                           ydata = fliers_xy[1])
            
            min_y = min(q1_start, min_y, fliers_xy[1].min())
            max_y = max(q4_end, max_y, fliers_xy[1].max())
            
        else:
            min_y = min(q1_start, min_y)
            max_y = max(q4_end, max_y)
                    
        # The y axis is rescaled to fit the new box plot completely with 10% 
        # of the maximum value at both ends
        axes.set_ylim([min_y*1.1, max_y*1.1])

    # If redraw is set to true, the canvas is updated.
    if redraw:
        ax.figure.canvas.draw()
        
    return box_plot

USAGE

Using inverse logic ( code at the very end ) I extracted the percentile values from this example

>>> percentiles
(-1.0597368367634488, 0.3977683984966961, 1.0298955252405229, 1.6693981537742526, 3.4951447843464449)
(-0.90494930553559483, 0.36916539612108634, 1.0303658700697103, 1.6874542731392828, 3.4951447843464449)
(0.13744105279440233, 1.3300645202649739, 2.6131540656339483, 4.8763411136047647, 9.5751914834437937)
(0.22786243898199182, 1.4120860286080519, 2.637650402506837, 4.9067126578493259, 9.4660357513550899)
(0.0064696168078617741, 0.30586770128093388, 0.70774153557312702, 1.5241965711101928, 3.3092932063051976)
(0.007009744579241136, 0.28627373934008982, 0.66039691869500572, 1.4772725266672091, 3.221716765477217)
(-2.2621660374110544, 5.1901313713883352, 7.7178532139979357, 11.277744848353247, 20.155971739152388)
(-2.2621660374110544, 5.1884411864079532, 7.3357079047721054, 10.792299385806913, 18.842012119715388)
(2.5417888074435702, 5.885996170695587, 7.7271286220368598, 8.9207423361593179, 10.846938621419374)
(2.5971767318505856, 5.753551925927133, 7.6569980004033464, 8.8161056254143233, 10.846938621419374)

Note that to keep this short I haven't shown the outliers vectors which will be the 6th element of each of the percentile array.

Also note that all usual additional kwargs / args can be used since they are simply passed to the boxplot method inside it :

>>> fig, ax = plt.subplots()
>>> b = customized_box_plot(percentiles, ax, redraw=True, notch=0, sym='+', vert=1, whis=1.5)
>>> plt.show()

EXPLANATION

The boxplot method returns a dictionary mapping the components of the boxplot to the individual matplotlib.lines.Line2D instances that were created.

Quoting from the matplotlib.pyplot.boxplot documentation :

That dictionary has the following keys (assuming vertical boxplots):

boxes: the main body of the boxplot showing the quartiles and the median’s confidence intervals if enabled.

medians: horizonal lines at the median of each box.

whiskers: the vertical lines extending to the most extreme, n-outlier data points. caps: the horizontal lines at the ends of the whiskers.

fliers: points representing data that extend beyond the whiskers (outliers).

means: points or lines representing the means.

For example observe the boxplot of a tiny sample data of [-9, -4, 2, 4, 9]

>>> b = ax.boxplot([[-9, -4, 2, 4, 9],])
>>> b
{'boxes': [<matplotlib.lines.Line2D at 0x7fe1f5b21350>],
'caps': [<matplotlib.lines.Line2D at 0x7fe1f54d4e50>,
<matplotlib.lines.Line2D at 0x7fe1f54d0e50>],
'fliers': [<matplotlib.lines.Line2D at 0x7fe1f5b317d0>],
'means': [],
'medians': [<matplotlib.lines.Line2D at 0x7fe1f63549d0>],
'whiskers': [<matplotlib.lines.Line2D at 0x7fe1f5b22e10>,
             <matplotlib.lines.Line2D at 0x7fe20c54a510>]} 

>>> plt.show()

The matplotlib.lines.Line2D objects have two methods that I'll be using in my function extensively. set_xdata ( or set_ydata ) and get_xdata ( or get_ydata ).

Using these methods we can alter the position of the constituent lines of the base box plot to conform to your percentile values ( which is what the customized_box_plot function does ). After altering the constituent lines' position, you can redraw the canvas using figure.canvas.draw()

Summarizing the mappings from percentile to the coordinates of the various Line2D objects.

The Y Coordinates :

  • The max ( q4_end - end of 4th quartile ) corresponds to the top most cap Line2D object.
  • The min ( q1_start - start of the 1st quartile ) corresponds to the lowermost most cap Line2D object.
  • The median corresponds to the ( q3_start ) median Line2D object.
  • The 2 whiskers lie between the ends of the boxes and extreme caps ( q1_start and q2_start - lower whisker; q4_start and q4_end - upper whisker )
  • The box is actually an interesting n shaped line bounded by a cap at the lower portion. The extremes of the n shaped line correspond to the q2_start and the q4_start.

The X Coordinates :

  • The Central x coordinates ( for multiple box plots are usually 1, 2, 3... )
  • The library automatically calculates the bounding x coordinates based on the width specified.

INVERSE FUNCTION TO RETRIEVE THE PERCENTILES FROM THE boxplot DICT:

def get_percentiles_from_box_plots(bp):
    percentiles = []
    for i in range(len(bp['boxes'])):
        percentiles.append((bp['caps'][2*i].get_ydata()[0],
                           bp['boxes'][i].get_ydata()[0],
                           bp['medians'][i].get_ydata()[0],
                           bp['boxes'][i].get_ydata()[2],
                           bp['caps'][2*i + 1].get_ydata()[0],
                           (bp['fliers'][i].get_xdata(),
                            bp['fliers'][i].get_ydata())))
    return percentiles

NOTE: The reason why I did not make a completely custom boxplot method is because, there are many features offered by the inbuilt box plot that cannot be fully reproduced.

Also excuse me if I may have unnecessarily explained something that may have been too obvious.

这篇关于是否可以在给定百分比值而不是原始输入的情况下绘制 matplotlib 箱线图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆