如何在Pandas DataFrame中对一系列值进行分类 [英] How to categorize a range of values in Pandas DataFrame
问题描述
假设我具有以下DataFrame:
Supose I have the following DataFrame:
Area
0 14.68
1 40.54
2 10.82
3 2.31
4 22.3
我想将这些值分类为范围.像A:[1,10],B:[11,20],C ...
And I want to categorize that values in range. Like A: [1,10], B: [11,20], C...
Area
0 B
1 D
2 C
3 A
4 C
我该如何使用Pandas?我尝试了以下代码:
How can I do it with Pandas? I tried following code:
bins = pd.IntervalIndex.from_tuples([(0, 11), (11, 20), (20, 50), (50, 100), (100, 500), (500, np.max(df["area"]) + 1)], closed='left')
catDf = pd.cut(df["area"], bins = bins)
但是"cut"命令只是将范围值放在DataFrame中,而我要放置类别名称而不是范围.
But "cut" command just put range values in DataFrame and I want put the categories names instead of range.
编辑:我尝试将标签传递给剪切,但没有任何变化. EDIT2 :为明确起见,如果"area"的值是10.21,则它在[10,20]的范围内,因此必须将该标签标记为"B"或该范围内的其他标签
EDIT: I tried to pass label to the cut, but nothing changes. EDIT2: To clarify, if the value of "area" have 10.21, so it's in range of [10,20], so it must be labeled like "B" or other label for that range of values.
推荐答案
For me working cat.codes
with indexing by converting list a
to numpy array:
a = list('ABCDEF')
df['new'] = np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes]
print (df)
Area new
0 14.68 B
1 40.54 C
2 10.82 A
3 2.31 A
4 22.30 C
5 600.00 F
catDf = pd.Series(np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes], index=df.index)
print (catDf)
0 B
1 C
2 A
3 A
4 C
5 F
dtype: object
这篇关于如何在Pandas DataFrame中对一系列值进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!