与Pandas并排的箱线图 [英] Side-by-side boxplots with Pandas

查看:121
本文介绍了与Pandas并排的箱线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对存储在熊猫dataframe中的五个变量进行比较.我使用了一个示例从这里,它可以正常工作,但是现在我需要更改坐标轴和标题,但是我很难做到这一点.

I need to plot comparison of five variable, stored in pandas dataframe. I used an example from here, it worked, but now I need to change the axes and titles, but I'm struggling to do so.

这是我的数据:

df1.groupby('cls').head()
Out[171]: 
   sensitivity  specificity  accuracy       ppv       auc       cls
0     0.772091     0.824487  0.802966  0.799290  0.863700       sig
1     0.748931     0.817238  0.776366  0.785910  0.859041       sig
2     0.774016     0.805909  0.801975  0.789840  0.853132       sig
3     0.826670     0.730071  0.795715  0.784150  0.850024       sig
4     0.781112     0.803839  0.824709  0.791530  0.863411       sig
0     0.619048     0.748290  0.694969  0.686138  0.713899  baseline
1     0.642348     0.702076  0.646216  0.674683  0.712632  baseline
2     0.567344     0.765410  0.710650  0.665614  0.682502  baseline
3     0.644046     0.733645  0.754621  0.683485  0.734299  baseline
4     0.710077     0.653871  0.707933  0.684313  0.732997  baseline

这是我的代码:

>> fig, axes = plt.subplots(ncols=5, figsize=(12, 5), sharey=True)
>> df1.query("cls in ['sig', 'baseline']").boxplot(by='cls', return_type='axes', ax=axes)

得到的图片是:

方法:

  • 更改标题(箱形图由cls分组")
  • 摆脱沿水平线绘制的烦人[cls]
  • 对显示在df1中的类别进行重新排序? (首先是敏感度,然后是特定的...)

推荐答案

我建议使用seaborn

以下是一个可以帮助您的示例:

Here is an example that might help you:

进口

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

制作数据

data = {'sensitivity' : np.random.normal(loc = 0, size = 10),
        'specificity' : np.random.normal(loc = 0, size = 10),
        'accuracy' : np.random.normal(loc = 0, size = 10),
        'ppv' : np.random.normal(loc = 0, size = 10),
        'auc' : np.random.normal(loc = 0, size = 10),
        'cls' : ['sig', 'sig', 'sig', 'sig', 'sig', 'baseline', 'baseline', 'baseline', 'baseline', 'baseline']}

df = pd.DataFrame(data)
df

Seaborn有一个漂亮的工具,称为factorplot,可创建子图网格,在其中使用数据构建行/列.为此,我们需要将df融化"为更可用的形状.

Seaborn has a nifty tool called factorplot that creates a grid of subplots where the rows/cols are built with your data. To be able to do this, we need to "melt" the df into a more usable shape.

df_melt = df.melt(id_vars = 'cls',
                  value_vars = ['accuracy',
                                'auc',
                                'ppv',
                                'sensitivity',
                                'specificity'],
                  var_name = 'columns')

现在,我们可以使用列"列创建factorplot.

Now we can create the factorplot using the col "columns".

a = sns.factorplot(data = df_melt,
                   x = 'cls',
                   y = 'value',
                   kind = 'box', # type of plot
                   col = 'columns',
                   col_order = ['sensitivity', # custom order of boxplots
                                'specificity',
                                'accuracy',
                                'ppv',
                                'auc']).set_titles('{col_name}') # remove 'column = ' part of title

plt.show()

您也可以只使用Seaborn的箱线图.

You can also just use Seaborn's boxplot.

b = sns.boxplot(data = df_melt,
                hue = 'cls', # different colors for different 'cls'
                x = 'columns',
                y = 'value',
                order = ['sensitivity', # custom order of boxplots
                         'specificity',
                         'accuracy',
                         'ppv',
                         'auc'])

sns.plt.title('Boxplot grouped by cls') # You can change the title here
plt.show()

这将为您提供相同的绘图,但全部显示在一个图中,而不是子图.它还允许您用一行更改图形标题.不幸的是,我找不到删除"columns"字幕的方法,但希望这能帮到您.

This will give you the same plot but all in one figure instead of subplots. It also allows you to change the title of the figure with one line. Unfortunately I can't find a way to remove the 'columns' subtitle but hopefully this will get you what you need.

编辑

要横向查看图,请执行以下操作: 因子图 交换xy值,将col = 'columns'更改为row = 'columns',将col_order = [...]更改为row_order = [...],然后将'{col_name}'更改为'{row_name}'

To view the plots sideways: Factorplot Swap your x and y values, change col = 'columns' to row = 'columns', change col_order = [...] to row_order = [...], and change '{col_name}' to '{row_name}' like so

a1 = sns.factorplot(data = df_melt,
                    x = 'value',
                    y = 'cls',
                    kind = 'box', # type of plot
                    row = 'columns',
                    row_order = ['sensitivity', # custom order of boxplots
                                 'specificity',
                                 'accuracy',
                                 'ppv',
                                 'auc']).set_titles('{row_name}') # remove 'column = ' part of title

plt.show()

箱形图 交换xy的值,然后像这样添加参数orient = 'h'

Boxplot Swap your x and y values then add the parameter orient = 'h' like so

b1 = sns.boxplot(data = df_melt,
                 hue = 'cls',
                 x = 'value',
                 y = 'columns',
                 order = ['sensitivity', # custom order of boxplots
                         'specificity',
                         'accuracy',
                         'ppv',
                         'auc'],
                 orient = 'h')

sns.plt.title('Boxplot grouped by cls')
plt.show()

这篇关于与Pandas并排的箱线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆