matplotlib中的箱线图:标记和异常值 [英] Boxplots in matplotlib: Markers and outliers

查看:1499
本文介绍了matplotlib中的箱线图:标记和异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对matplotlib中的 boxplots 有疑问:

问题A .我在下面用 Q1 Q2 Q3 突出显示的标记是什么?我相信 Q1 是最大值,而 Q3 是离群值,但是什么是 Q2 ?

问题B :matplotlib如何识别异常值? (即如何知道它们不是真实的maxmin值?)

解决方案

下面是一个图形,该图形说明了 stats中的框的组成部分.stackexchange答案.请注意,如果您未在Pandas中提供whis关键字,则k = 1.5.

Pandas中的boxplot函数是matplotlib.pyplot.boxplot的包装. matplotlib文档详细说明了这些框的组成部分:

问题A:

此框从数据的下四分位数到上四分位数延伸,中间有一条线.

即输入数据值的四分之一位于框的下方,四分之一的数据位于框的每个部分,其余的四分之一位于框的上方.

问题B:

whis:浮点数,序列或字符串(默认= 1.5)

作为浮子,确定晶须的触及范围超出 第一和第三四分位数.换句话说,IQR是 四分位间距(Q3-Q1),上晶须将延伸到最后 数据小于Q3 + whis * IQR).同样,下晶须会 延伸到大于Q1的第一个基准-whis * IQR.超过 晶须,数据被认为是离群值,并作为单个图进行绘制 点.

Matplotlib(和Pandas)还为您提供了许多选项来更改晶须的默认定义:

将此值设置为不合理的高值,以强制晶须显示 最小值和最大值.或者,将其设置为升序 百分位数的序列(例如[5,95])以将晶须设置为特定 数据的百分位.最后,whis可以是字符串"range" 强制晶须达到数据的最小值和最大值.

I have some questions about boxplots in matplotlib:

Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?

                      

Question B How does matplotlib identify outliers? (i.e. how does it know that they are not the true max and min values?)

解决方案

Here's a graphic that illustrates the components of the box from a stats.stackexchange answer. Note that k=1.5 if you don't supply the whis keyword in Pandas.

The boxplot function in Pandas is a wrapper for matplotlib.pyplot.boxplot. The matplotlib docs explain the components of the boxes in detail:

Question A:

The box extends from the lower to upper quartile values of the data, with a line at the median.

i.e. a quarter of the input data values is below the box, a quarter of the data lies in each part of the box, and the remaining quarter lies above the box.

Question B:

whis : float, sequence, or string (default = 1.5)

As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis*IQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whis*IQR. Beyond the whiskers, data are considered outliers and are plotted as individual points.

Matplotlib (and Pandas) also gives you a lot of options to change this default definition of the whiskers:

Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.

这篇关于matplotlib中的箱线图:标记和异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆