Pandas - 从列值创建列,并用计数填充 [英] Pandas - Create columns from column value, and fill with count

查看:58
本文介绍了Pandas - 从列值创建列,并用计数填充的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于下面的数据框.

索引时间 工作日0 21:10:00 周二1 21:15:00 周二2 21:20:00 周二3 21:20:00 周二4 21:25:00 星期三5 21:25:00 星期三6 21:30:00 周五7 21:35:00 星期四8 21:35:00 星期三9 21:40:00 星期三10 21:40:00 星期三11 21:40:00 星期一

我想把工作日列成列,计算每天每次出现的次数,我的目标是这样的:

时间周一周二周三周四周五21:10:00 0 1 0 0 021:15:00 0 1 0 0 021:20:00 0 2 0 0 021:25:00 0 0 2 0 021:30:00 0 0 0 0 121:35:00 0 0 1 1 021:40:00 1 0 2 0 0

这样做的原因是因为我想在 seaborn 中创建一个热图,并且我读到我的数据必须以某种方式旋转/成形:

I have a dataframe similar to below.

Index Time Weekday 0 21:10:00 Tuesday 1 21:15:00 Tuesday 2 21:20:00 Tuesday 3 21:20:00 Tuesday 4 21:25:00 Wednesday 5 21:25:00 Wednesday 6 21:30:00 Friday 7 21:35:00 Thursday 8 21:35:00 Wednesday 9 21:40:00 Wednesday 10 21:40:00 Wednesday 11 21:40:00 Monday

I want to put the weekdays into columns, and count how many times each time appears for each day, my goal is this:

Time Monday Tuesday Wednesday Thursday Friday 21:10:00 0 1 0 0 0 21:15:00 0 1 0 0 0 21:20:00 0 2 0 0 0 21:25:00 0 0 2 0 0 21:30:00 0 0 0 0 1 21:35:00 0 0 1 1 0 21:40:00 1 0 2 0 0

The reason for this is because I want to create a heatmap in seaborn, and I read my data has to be pivoted/shaped a certain way: https://stackoverflow.com/a/37790707/9384889

I know how to count how frequent each Time value appears, ignoring the weekday: df['Time'].value_counts() And I have been reading http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html But I cannot see how to combine these two ideas.

解决方案

Use groupby with size and unstack or crosstab alternative for reshape.

For change order of days need ordered Categorical or reindex by columns:

cats = ['Monday','Tuesday','Wednesday','Thursday','Friday']

df['Weekday'] = pd.Categorical(df['Weekday'], categories=cats, ordered=True)

df = df.groupby(['Time', 'Weekday']).size().unstack(fill_value=0)


df = df.groupby(['Time', 'Weekday']).size().unstack(fill_value=0).reindex(columns=cats)

Alternatives:

df = pd.crosstab(df['Time'], pd.Categorical(df['Weekday'], categories=cats, ordered=True))

df = pd.crosstab(df['Time'], df['Weekday']).reindex(columns=cats)


print (df)

col_0     Monday  Tuesday  Wednesday  Thursday  Friday
Time                                                  
21:10:00       0        1          0         0       0
21:15:00       0        1          0         0       0
21:20:00       0        2          0         0       0
21:25:00       0        0          2         0       0
21:30:00       0        0          0         0       1
21:35:00       0        0          1         1       0
21:40:00       1        0          2         0       0

Last use seaborn.heatmap:

import seaborn as sns

sns.heatmap(df, annot=True, fmt="g", cmap='viridis')

这篇关于Pandas - 从列值创建列,并用计数填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆