使用groupby和pandas数据框中的多列从字符串数据创建条形图 [英] Create barplot from string data using groupby and multiple columns in pandas dataframe

查看:260
本文介绍了使用groupby和pandas数据框中的多列从字符串数据创建条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用python从多个是"或否"数据中绘制多个x类别的条形图.我已经开始编写一些代码,但是我认为我正在以缓慢的方式获得所需的解决方案.对于使用seaborn,Matplotlib或pandas但不使用Bokeh的解决方案,我会很好,因为我想使出版物质量的数字能够按比例缩放.

I'd like to make a bar plot in python with multiple x-categories from counts of data either "yes" or "no". I've started on some code but I believe the track I'm on in a slow way of getting to the solution I want. I'd be fine with a solution that uses either seaborn, Matplotlib, or pandas but not Bokeh because I'd like to make publication-quality figures that scale.

我最终想要的是:

  • 在x轴上具有独木舟",游轮",皮划艇"和船"类别的条形图
  • 按颜色"分组,因此绿色或红色
  • 显示是"响应的比例:因此,是"行数除以红色"和绿色"的计数,在这种情况下为4个红色和4个绿色,但是可以改变.

这是我正在使用的数据集:

Here's the dataset I'm working with:

import pandas as pd
data = [{'ship': 'Yes','canoe': 'Yes', 'cruise': 'Yes', 'kayak': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Red'},{'ship': 'No', 'cruise': 'Yes', 'kayak': 'No','canoe': 'Yes','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Red'}]
df = pd.DataFrame(data)

这就是我的开始:

print(df['color'].value_counts())

red = 4 # there must be a better way to code this rather than manually. Perhaps using len()?
green = 4

# get count per type
ca = df['canoe'].value_counts()
cr = df['cruise'].value_counts()
ka = df['kayak'].value_counts()
sh = df['ship'].value_counts()
print(ca, cr, ka, sh)

# group by color
cac = df.groupby(['canoe','color'])
crc = df.groupby(['cruise','color'])
kac = df.groupby(['kayak','color'])
shc = df.groupby(['ship','color'])

# make plots 
cac2 = cac['color'].value_counts().unstack()
cac2.plot(kind='bar', title = 'Canoe by color')

但是我真正想要的是所有x分类都在一个图上,只显示是"响应的结果,并取作是"的比例,而不仅仅是计数.帮助吗?

But really what I want is all of the x-categories to be on one plot, only showing the result for "Yes" responses, and taken as the proportion of "Yes" rather than just counts. Help?

推荐答案

让我们尝试.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import groupby

data = [{'ship': 'Yes','canoe': 'Yes', 'cruise': 'Yes', 'kayak': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'No','canoe': 'No','color': 'Red'},{'ship': 'Yes', 'cruise': 'Yes', 'kayak': 'Yes','canoe': 'No','color': 'Red'},{'ship': 'No', 'cruise': 'Yes', 'kayak': 'No','canoe': 'Yes','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Green'},{'ship': 'No', 'cruise': 'No', 'kayak': 'No','canoe': 'No','color': 'Red'}]
df = pd.DataFrame(data)
df1 = df.replace(["Yes","No"],[1,0]).groupby("color").mean().stack().rename('% Yes').to_frame()


def add_line(ax, xpos, ypos):
    line = plt.Line2D([xpos, xpos], [ypos + .1, ypos],
                      transform=ax.transAxes, color='gray')
    line.set_clip_on(False)
    ax.add_line(line)

def label_len(my_index,level):
    labels = my_index.get_level_values(level)
    return [(k, sum(1 for i in g)) for k,g in groupby(labels)]

def label_group_bar_table(ax, df):
    ypos = -.1
    scale = 1./df.index.size
    for level in range(df.index.nlevels)[::-1]:
        pos = 0
        for label, rpos in label_len(df.index,level):
            lxpos = (pos + .5 * rpos)*scale
            ax.text(lxpos, ypos, label, ha='center', transform=ax.transAxes)
            add_line(ax, pos*scale, ypos)
            pos += rpos
        add_line(ax, pos*scale , ypos)
        ypos -= .1


colorlist = ['green','red']
cp = sns.color_palette(colorlist)

ax = sns.barplot(x=df1.index, y='% Yes', hue = df1.index.get_level_values(0), data=df1, palette=cp)
#Below 2 lines remove default labels
ax.set_xticklabels('')
ax.set_xlabel('')
label_group_bar_table(ax, df1)

输出:

这篇关于使用groupby和pandas数据框中的多列从字符串数据创建条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆