pandas 多列组通过绘图 [英] Pandas Multicolumn Groupby Plotting

查看:146
本文介绍了 pandas 多列组通过绘图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

我想按年份 - 月份和规则名称分组一个熊猫数据框。一旦分组,我希望能够获得在该期间每个规则的计数以及该组所有规则的百分比。到目前为止,我能够得到每个时期的数量,但不是百分比。

Problem:
I have a pandas dataframe of data that I would like to group-by year-months and rule_name. Once grouped by I want to be able to get the counts of each of the rules during that period and the % of all the rules for that group. So far I am able to get each of the periods counts but not the percentage.

目标是让底部的图形与底部的图形相似,但在右侧的y轴上我也会有时间百分比。

The goal is to have a plot similar to the ones at the bottom but on the right-y axis I would have percentage of the time period as well.

目标数据框:

对于rule_name A:

Goal Dataframes:
For rule_name A:

date       counts (rule_name)   %_rule_name 
Jan 16     1                   50
Feb 16     0                    0
Jun 16     2                   66

我想继续为每个规则名称(即B和C)

I would like to continue this for each rule_name (i.e. for B and C)

至此为止:

d  = {'date': ['1/1/2016', '2/1/2016', '3/5/2016', '2/5/2016', '1/15/2016', '3/3/2016', '3/4/2016'],
 'rule_name' : ['A' , 'B', 'C', 'C', 'B', 'A','A']}

df = pd.DataFrame(d)

Output:

# format string date to datetime
df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y', errors='coerce')


rule_names = df['rule_name'].unique().tolist()
for i in rule_names:
    print ""
    print 'dataframe for', i ,':'
    df_temp = df[df['rule_name'] == i]
    df_temp = df_temp.groupby(df_temp['date'].map(lambda x: str(x.year) + '-' + str(x.strftime('%m')))).count()
    df_temp.plot(kind='line', title = 'Rule Name: ' + str(i))
    print df_temp

Output:

我觉得有更好的方法来做到这一点,但我无法弄清楚。在最后一天,我一直在为这个问题绞尽脑汁。我应该过滤吗?我尝试了多索引组,但无法创建%_rule_name列。感谢您提前输入。

I feel like there is a better way to do this but am unable to figure it out. I have been racking my brains on this problem for the last day'ish'. Should I be filtering? I tried a multi-index group-by but could not create a %_rule_name column. Thanks for input in advance.

推荐答案

我能解决这个问题。以下代码提供了必要的图表和数据处理。我正在推出以防别人帮助别人。它感觉有点疯狂,但它可以完成。任何建议,以改善这一点,将不胜感激。

I was able to resolve this. The following code provides the necessary plots and data processing. I am putting it up in case this helps someone else. It feels kind of janky but it gets the trick done. Any suggestion to improve this would be appreciated.

感谢所以。

import seaborn as sns

df_all = df.groupby(df['date'].map(lambda x: str(x.year) + '-' + str(x.strftime('%m')))).count()
df_all = pd.DataFrame(df_all)
df_all['rule_name_all_count'] = df_all['rule_name']

rule_names = df['rule_name'].unique().tolist()
for i in rule_names:
    print ""
    print 'dataframe for', i ,':'
    df_temp = df[df['rule_name'] == i]
    df_temp = df_temp.groupby(df_temp['date'].map(lambda x: str(x.year) + '-' + str(x.strftime('%m')))).count()
    df_temp = pd.DataFrame(df_temp)
    df_merge = pd.merge(df_all, df_temp, right_index = True, left_index = True, how='left')
    drop_x(df_merge)
    rename_y(df_merge)
    df_merge.drop('date', axis=1, inplace=True)
    df_merge['rule_name_%'] = df_merge['rule_name'].astype(float) / df_merge['rule_name_all_count'].astype(float)
    df_merge = df_merge.fillna(0) 

    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax2 = ax.twinx()

    df_merge['rule_name'].plot()
    df_merge['rule_name_%'].plot()
    plt.show()
    print df_temp

这篇关于 pandas 多列组通过绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆