pandas :如何对行中具有多个级别的计数进行分组? [英] Pandas: how to groupby with count with multiple levels on rows?
问题描述
我有以下数据框
|----|----|
| A | B |
| a1 | b1 |
| a2 | b1 |
| a1 | b2 |
| a2 | b3 |
我想按 A 进行 B 计数,并得到以下结果:
I want to count by B per A and get the following result:
|----|----|-------|
| A | B | Count |
| a1 | b1 | 1 |
| | b2 | 1 |
| | b3 | NaN |
| a2 | b1 | 1 |
| | b2 | NaN |
| | b3 | 1 |
我通常使用df.groupby([B])[A].count()
进行此操作,但在这种情况下,使用枢轴表对我来说很混乱
I usually do this with df.groupby([B])[A].count()
but in this case with kinda pivot table it's confusing for me
谢谢.
UPDT:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20422 entries, 180 to 96430
Data columns (total 2 columns):
B 20422 non-null object
A 20422 non-null object
dtypes: object(2)
memory usage: 478.6+ KB
我正在使用df.groupby([B])[A].value_counts().unstack().stack(dropna=False).reset_index(name="Count")
:
|--|----|----|-------|
| | A | B | Count |
|0 | a1 | b1 | 1 |
|1 | a1 | b2 | 1 |
|2 | a1 | b3 | NaN |
|3 | a2 | b1 | 1 |
|4 | a2 | b2 | NaN |
|5 | a2 | b3 | 1 |
推荐答案
1) One way would be grouping on "A"
and computing the distinct counts of elements under "B"
using value_counts
. Then a fusion of unstack
and stack
with dropna=False
to get the desired DF
:
df.groupby('A')['B'].value_counts().unstack().stack(dropna=False).reset_index(name="Count")
2) pd.crosstab
如果在堆叠后用np.NaN
替换零计数元素,则a>也是一个很好的选择:
2) pd.crosstab
also provides a good alternative if we replace the zero count elements with np.NaN
after stacking:
pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
两种方法都会产生收益:
Both approaches yield:
edit1:
edit1:
要具有分组键,必须以某种格式显示"A"
(即,保留第一次出现的内容,而将其余的替换为空字符串)
To have the grouped key, "A"
be displayed in a certain format (i.e keep the first occurence while replacing the rest with an empty string)
df_g = pd.crosstab(df['A'], df['B']).stack().replace({0:np.nan}).reset_index(name="Count")
df_g.loc[df_g.duplicated('A'), "A"] = ""
edit2:
edit2:
如果要将"A"
作为单个有益健康的单元格包含在多索引DF
中:
If you want "A"
as a single wholesome cell being part of a multi-indexed DF
:
df.groupby('A')['B'].value_counts().unstack().stack(dropna=False
).reset_index(name="Count").set_index(['A', 'B'])
这篇关于 pandas :如何对行中具有多个级别的计数进行分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!