Boxplot:异常值标签Python [英] Boxplot : Outliers Labels Python
问题描述
我正在使用seaborn软件包制作时间序列箱线图,但无法在离群值上贴标签。
I'm making a time series boxplot using seaborn package but I can't put a label on my outliers.
我的数据是3列的dataFrame: [Month,Id,Value]
,我们可以像这样伪造:
My data is a dataFrame of 3 columns : [Month , Id , Value]
that we can fake like that :
### Sample Data ###
Month = numpy.repeat(numpy.arange(1,11),10)
Id = numpy.arange(1,101)
Value = numpy.random.randn(100)
### As a pandas DataFrame ###
Ts = pandas.DataFrame({'Value' : Value,'Month':Month, 'Id': Id})
### Time series boxplot ###
ax = seaborn.boxplot(x="Month",y="Value",data=Ts)
我每个月都有一个箱线图,我试图将 Id
用作标签此处的三个异常值:
I have one boxplot for each month and I'm trying to put the Id
as a label of the three outliers on the plot here:
推荐答案
首先,您需要检测数据框中哪些 Id
是异常值,可以使用
First of all, you need to detect which Id
in your dataframe are outliers, you can use this:
outliers_df = pd.DataFrame(columns = ['Value', 'Month', 'Id'])
for month in Ts['Month'].unique():
outliers = [y for stat in boxplot_stats(Ts[Ts['Month'] == month]['Value']) for y in stat['fliers']]
if outliers != []:
for outlier in outliers:
outliers_df = outliers_df.append(Ts[(Ts['Month'] == month) & (Ts['Value'] == outlier)])
创建一个类似于原始数据框的数据框,其中包含
然后您可以使用以下方法在情节上注释 Id
:
which creates a dataframe, similar to the original one, containing outliers only.
Then you can annotare Id
on your plot with this:
for row in outliers_df.iterrows():
ax.annotate(row[1]['Id'], xy=(row[1]['Month'] - 1, row[1]['Value']), xytext=(2,2), textcoords='offset points', fontsize=14)
完整代码:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
sns.set_style('darkgrid')
Month = np.repeat(np.arange(1,11),10)
Id = np.arange(1,101)
Value = np.random.randn(100)
Ts = pd.DataFrame({'Value' : Value,'Month':Month, 'Id': Id})
fig, ax = plt.subplots()
sns.boxplot(ax=ax, x="Month",y="Value",data=Ts)
outliers_df = pd.DataFrame(columns = ['Value', 'Month', 'Id'])
for month in Ts['Month'].unique():
outliers = [y for stat in boxplot_stats(Ts[Ts['Month'] == month]['Value']) for y in stat['fliers']]
if outliers != []:
for outlier in outliers:
outliers_df = outliers_df.append(Ts[(Ts['Month'] == month) & (Ts['Value'] == outlier)])
for row in outliers_df.iterrows():
ax.annotate(row[1]['Id'], xy=(row[1]['Month'] - 1, row[1]['Value']), xytext=(2,2), textcoords='offset points', fontsize=14)
plt.show()
输出:
这篇关于Boxplot:异常值标签Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!