枚举 DataFrame 中每个组的每一行 [英] Enumerate each row for each group in a DataFrame
问题描述
在 Pandas 中,如何添加一个新列来枚举基于给定分组的行?
In pandas, how can I add a new column which enumerates rows based on a given grouping?
例如,假设以下 DataFrame:
For instance, assume the following DataFrame:
import pandas as pd
import numpy as np
a_list = ['A', 'B', 'C', 'A', 'A', 'C', 'B', 'B', 'A', 'C']
df = pd.DataFrame({'col_a': a_list, 'col_b': range(10)})
df
col_a col_b
0 A 0
1 B 1
2 C 2
3 A 3
4 A 4
5 C 5
6 B 6
7 B 7
8 A 8
9 C 9
我想添加一个 col_c
,它根据 col_a
的分组和 col_b<的排序为我提供组"的第 N 行/代码>.
I'd like to add a col_c
that gives me the Nth row of the "group" based on a grouping of col_a
and sorting of col_b
.
所需的输出:
col_a col_b col_c
0 A 0 1
3 A 3 2
4 A 4 3
8 A 8 4
1 B 1 1
6 B 6 2
7 B 7 3
2 C 2 1
5 C 5 2
9 C 9 3
我正在努力访问 col_c
.您可以使用 .sort_index(by=['col_a', 'col_b'])
进行正确的分组和排序,现在只需转到该新列并标记每一行.>
I'm struggling to get to col_c
. You can get to the proper grouping and sorting with .sort_index(by=['col_a', 'col_b'])
, it's now a matter of getting to that new column and labeling each row.
推荐答案
cumcount,对于这种情况:
df['col_c'] = g.cumcount()
正如文档中所说:
为每组中的每一项编号,从 0 到该组的长度 - 1.
Number each item in each group from 0 to the length of that group - 1.
<小时>
原始答案(在定义 cumcount 之前).
您可以创建一个辅助函数来执行此操作:
You could create a helper function to do this:
def add_col_c(x):
x['col_c'] = np.arange(len(x))
return x
首先按列 col_a 排序:
First sort by column col_a:
In [11]: df.sort('col_a', inplace=True)
然后在每个组中应用这个函数:
then apply this function across each group:
In [12]: g = df.groupby('col_a', as_index=False)
In [13]: g.apply(add_col_c)
Out[13]:
col_a col_b col_c
3 A 3 0
8 A 8 1
0 A 0 2
4 A 4 3
6 B 6 0
1 B 1 1
7 B 7 2
9 C 9 0
2 C 2 1
5 C 5 2
为了得到 1,2,...
,你可以使用 np.arange(1, len(x) + 1)
.em>
In order to get 1,2,...
you couls use np.arange(1, len(x) + 1)
.
这篇关于枚举 DataFrame 中每个组的每一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!