pandas :如何对行中具有多个级别的计数进行分组? [英] Pandas: how to groupby with count with multiple levels on rows?

查看:66
本文介绍了 pandas :如何对行中具有多个级别的计数进行分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框

|----|----|
| A  | B  |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |

我想按 A 进行 B 计数,并得到以下结果:

I want to count by B per A and get the following result:

|----|----|-------|
| A  | B  | Count |
| a1 | b1 |  1    |
|    | b2 |  1    |
|    | b3 |  NaN  |
| a2 | b1 |  1    |
|    | b2 |  NaN  |
|    | b3 |  1    |

我通常使用df.groupby([B])[A].count()进行此操作,但在这种情况下,使用枢轴表对我来说很混乱

I usually do this with df.groupby([B])[A].count() but in this case with kinda pivot table it's confusing for me

谢谢.

UPDT:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B    20422 non-null object
A             20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB

我正在使用df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count"):

|--|----|----|-------|
|  | A  | B  | Count |
|0 | a1 | b1 |  1    |
|1 | a1 | b2 |  1    |
|2 | a1 | b3 |  NaN  |
|3 | a2 | b1 |  1    |
|4 | a2 | b2 |  NaN  |
|5 | a2 | b3 |  1    |

推荐答案

1)一种方法是在"A"上分组并使用

1) One way would be grouping on "A" and computing the distinct counts of elements under "B" using value_counts. Then a fusion of unstack and stack with dropna=False to get the desired DF:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")

2) pd.crosstab np.NaN替换零计数元素,则a>也是一个很好的选择:

2) pd.crosstab also provides a good alternative if we replace the zero count elements with np.NaN after stacking:

pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")

两种方法都会产生收益:

Both approaches yield:

edit1:

edit1:

要具有分组键,必须以某种格式显示"A"(即,保留第一次出现的内容,而将其余的替换为空字符串)

To have the grouped key, "A" be displayed in a certain format (i.e keep the first occurence while replacing the rest with an empty string)

df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""

edit2:

edit2:

如果要将"A"作为单个有益健康的单元格包含在多索引DF中:

If you want "A" as a single wholesome cell being part of a multi-indexed DF:

df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
                    ).reset_index(name="Count").set_index(['A', 'B'])

这篇关于 pandas :如何对行中具有多个级别的计数进行分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆