枚举 DataFrame 中每个组的每一行 [英] Enumerate each row for each group in a DataFrame

查看：118 发布时间：2021/6/13 20:08:29 python pandas

本文介绍了枚举 DataFrame 中每个组的每一行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 Pandas 中，如何添加一个新列来枚举基于给定分组的行?

In pandas, how can I add a new column which enumerates rows based on a given grouping?

例如，假设以下 DataFrame:

For instance, assume the following DataFrame:

import pandas as pd
import numpy as np

a_list = ['A', 'B', 'C', 'A', 'A', 'C', 'B', 'B', 'A', 'C']
df = pd.DataFrame({'col_a': a_list, 'col_b': range(10)})
df
  col_a  col_b
0     A      0
1     B      1
2     C      2
3     A      3
4     A      4
5     C      5
6     B      6
7     B      7
8     A      8
9     C      9

我想添加一个 col_c，它根据 col_a 的分组和 col_b<的排序为我提供组"的第 N 行/代码>.


I'd like to add a col_c that gives me the Nth row of the "group" based on a grouping of col_a and sorting of col_b.
所需的输出:
  col_a  col_b  col_c
0     A      0      1
3     A      3      2
4     A      4      3
8     A      8      4
1     B      1      1
6     B      6      2
7     B      7      3
2     C      2      1
5     C      5      2
9     C      9      3

我正在努力访问 col_c.您可以使用 .sort_index(by=['col_a', 'col_b']) 进行正确的分组和排序，现在只需转到该新列并标记每一行.
I'm struggling to get to col_c.  You can get to the proper grouping and sorting with .sort_index(by=['col_a', 'col_b']), it's now a matter of getting to that new column and labeling each row.
推荐答案
cumcount，对于这种情况:
df['col_c'] = g.cumcount()

正如文档中所说:
为每组中的每一项编号，从 0 到该组的长度 - 1.

  Number each item in each group from 0 to the length of that group - 1.
<小时>
原始答案(在定义 cumcount 之前).
您可以创建一个辅助函数来执行此操作:
You could create a helper function to do this:
def add_col_c(x):
    x['col_c'] = np.arange(len(x))
    return x

首先按列 col_a 排序:
First sort by column col_a:
In [11]: df.sort('col_a', inplace=True)

然后在每个组中应用这个函数:
then apply this function across each group:
In [12]: g = df.groupby('col_a', as_index=False)

In [13]: g.apply(add_col_c)
Out[13]:
  col_a  col_b  col_c
3     A      3      0
8     A      8      1
0     A      0      2
4     A      4      3
6     B      6      0
1     B      1      1
7     B      7      2
9     C      9      0
2     C      2      1
5     C      5      2

为了得到 1,2,...，你可以使用 np.arange(1, len(x) + 1).em>

In order to get 1,2,... you couls use np.arange(1, len(x) + 1).

这篇关于枚举 DataFrame 中每个组的每一行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

枚举 DataFrame 中每个组的每一行 [英] Enumerate each row for each group in a DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

枚举 DataFrame 中每个组的每一行 [英] Enumerate each row for each group in a DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭