Pandas - 按所有列分组并在原始数据框中进行标记 [英] Pandas - groupby all columns and mark in original dataframe

查看：66 发布时间：2021/6/13 20:14:39 python pandas

本文介绍了Pandas - 按所有列分组并在原始数据框中进行标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 DataFrame 列 'Id' 是唯一的，'A', 'B', 'C'，等等……

I have a DataFrame with columns 'Id' which is unique, and 'A', 'B', 'C', etc...

有不同的行，其中所有值 'A', 'B', 'C' 都相同.我想给他们一个组名(从 1 开始的运行索引).

There are different rows where all values 'A', 'B', 'C' are the same. I'd like to give them a group name (a running index from 1).

例如:

df = pd.DataFrame({"A": [1, 1, 1, 2], "B": [3, 4, 4, 4], "C": [5, 5, 5, 5]})
df
Out[127]: 
   A  B  C
0  1  3  5
1  1  4  5
2  1  4  5
3  2  4  5

会变成

   A  B  C  grp
0  1  3  5    1
1  1  4  5    2
2  1  4  5    2
3  2  4  5    3

我知道我可以分组 ['A', 'B', 'C'] 并获取密钥，但是，我必须在未优化的情况下迭代密钥和数据帧事情.我没有以优化的方式做到这一点

I know I can groupby ['A', 'B', 'C'] and get the keys, but than, I have to iterate over the keys and Dataframe in an un-optimized matter. I'm failing to do it in an optimized way

推荐答案

使用 GroupBy.ngroup:

df['grp'] = df.groupby(['A', 'B', 'C']).ngroup() + 1
print (df)

   A  B  C  grp
0  1  3  5    1
1  1  4  5    2
2  1  4  5    2
3  2  4  5    3

如果列已排序:

df['grp'] = pd.factorize([tuple(x) for x in df.values])[0] + 1

这篇关于Pandas - 按所有列分组并在原始数据框中进行标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas - 按所有列分组并在原始数据框中进行标记 [英] Pandas - groupby all columns and mark in original dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas - 按所有列分组并在原始数据框中进行标记 [英] Pandas - groupby all columns and mark in original dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭