多列的 pandas 盒图 [英] pandas box plot for multiple column

查看:72
本文介绍了多列的 pandas 盒图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框(熊猫的结构)看起来像
以上

My data frames (pandas's structure) looks like above

现在我想制作箱线图在单独的画布上的每个功能。分离条件为第一列。我有类似的直方图图(下面的代码),但是我不能为箱形图制作工作版本。

Now I want to make boxplot for each feature on separate canvas. The separation condition is the first column. I have similar plot for histogram (code below) but I can't make working version for the boxplot.

 hist_params = {'normed': True, 'bins': 60, 'alpha': 0.4}
# create the figure
fig = plt.figure(figsize=(16,  25))
for n, feature in enumerate(features):
    # add sub plot on our figure
    ax = fig.add_subplot(features.shape[1] // 5 + 1, 6, n + 1)
    # define range for histograms by cutting 1% of data from both ends
    min_value, max_value = numpy.percentile(data[feature], [1, 99])
    ax.hist(data.ix[data.is_true_seed.values == 0, feature].values, range=(min_value, max_value), 
             label='ghost', **hist_params)
    ax.hist(data.ix[data.is_true_seed.values == 1, feature].values, range=(min_value, max_value), 
             label='true', **hist_params)
    ax.legend(loc='best')

    ax.set_title(feature)

以上代码产生的输出如下(仅附加了一部分):

Above code produce such output as (attached only part of it):

推荐答案

DataFrame.boxplot() 可以很好地自动化:

DataFrame.boxplot() automates this rather well:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'is_true_seed': np.random.choice([True, False], 10),
                   'col1': np.random.normal(size=10),
                   'col2': np.random.normal(size=10),
                   'col3': np.random.normal(size=10)})

fig, ax = plt.subplots(figsize=(10,  10))
df.boxplot(['col1', 'col2', 'col3'], 'is_true_seed', ax)

第一个参数告诉熊猫要绘制哪些列,第二个要分组的列(您称为分离条件),第三个要绘制的轴。

The first argument tells pandas which columns to plot, the second which column to group by (what you call the separation condition), and the third on which axes to draw.

列出除要分组的列之外的所有列可能会很乏味,但是您可以通过省略第一个参数来避免这种情况。然后,您必须明确命名其他两个名称:

Listing all columns but the one you want to group by can get tedious, but you can avoid it by omitting that first argument. You then have to explicitly name the other two:

df.boxplot(by='is_true_seed', ax=ax)

这篇关于多列的 pandas 盒图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆