如何在Pandas DataFrame中对一系列值进行分类 [英] How to categorize a range of values in Pandas DataFrame

查看：159 发布时间：2020/5/24 3:17:13 python pandas

本文介绍了如何在Pandas DataFrame中对一系列值进行分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我具有以下DataFrame:

Supose I have the following DataFrame:

我想将这些值分类为范围.像A:[1,10]，B:[11,20]，C ...

And I want to categorize that values in range. Like A: [1,10], B: [11,20], C...

   Area
0  B
1  D
2  C
3  A
4  C

我该如何使用Pandas?我尝试了以下代码:

How can I do it with Pandas? I tried following code:

bins = pd.IntervalIndex.from_tuples([(0, 11), (11, 20), (20, 50), (50, 100), (100, 500), (500, np.max(df["area"]) + 1)], closed='left')
catDf = pd.cut(df["area"], bins = bins)

但是"cut"命令只是将范围值放在DataFrame中，而我要放置类别名称而不是范围.

But "cut" command just put range values in DataFrame and I want put the categories names instead of range.

编辑:我尝试将标签传递给剪切，但没有任何变化. EDIT2 :为明确起见，如果"area"的值是10.21，则它在[10,20]的范围内，因此必须将该标签标记为"B"或该范围内的其他标签

EDIT: I tried to pass label to the cut, but nothing changes. EDIT2: To clarify, if the value of "area" have 10.21, so it's in range of [10,20], so it must be labeled like "B" or other label for that range of values.

推荐答案

对我来说，

For me working cat.codes with indexing by converting list a to numpy array:

a = list('ABCDEF')
df['new'] = np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes]
print (df)
     Area new
0   14.68   B
1   40.54   C
2   10.82   A
3    2.31   A
4   22.30   C
5  600.00   F

catDf = pd.Series(np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes], index=df.index)
print (catDf)
0    B
1    C
2    A
3    A
4    C
5    F
dtype: object

这篇关于如何在Pandas DataFrame中对一系列值进行分类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Pandas DataFrame中对一系列值进行分类 [英] How to categorize a range of values in Pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Pandas DataFrame中对一系列值进行分类 [英] How to categorize a range of values in Pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭