替换 pandas 数据框的唯一值 [英] Replace unique values of pandas data-frame
问题描述
我是python和pandas的新手.
Hi I'm new to python and pandas.
我已经使用熊猫提取了其中一列的唯一值. 现在,在获得列的唯一值(即字符串)之后.
I have extracted the unique values of one of the column using pandas. Now after getting the unique values of the column, which are string.
['Others, Senior Management-Finance, Senior Management-Sales'
'Consulting, Strategic planning, Senior Management-Finance'
'Client Servicing, Quality Control - Product/ Process, Strategic
planning'
'Administration/ Facilities, Business Analytics, Client Servicing'
'Sales & Marketing, Sales/ Business Development/ Account Management,
Sales Support']
我想用唯一的整数值替换字符串值.
I want to replace the string values with the unique integer value.
为简单起见,我可以为您提供虚拟输入和输出.
for simplicity I can give you the dummy input and output.
输入:
Col1
A
A
B
B
B
C
C
唯一的df值如下所示
[ 'A' 'B' 'C' ]
替换列后,应如下图所示
after replacing the column should look like this
Col1
1
1
2
2
2
3
3
请给我建议使用循环或其他任何方式的方法,因为我拥有超过300
个唯一值.
Please suggest me the way how can I do it by using loop or any other way because I have more than 300
unique values.
推荐答案
使用 调整值.
另一种 numpy.unique
解决方案,但速度较慢在巨大的数据框中:
_,idx = np.unique(df['Col1'],return_inverse=True)
df['Col1'] = idx + 1
print (df)
Col1
0 1
1 1
2 2
3 2
4 2
5 3
6 3
最后,您可以将值转换为 categorical
-主要是因为内存使用情况:
Last you can convert values to categorical
- mainly because less memory usage:
df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
Col1
0 0
1 0
2 1
3 1
4 1
5 2
6 2
print (df.dtypes)
Col1 category
dtype: object
这篇关于替换 pandas 数据框的唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!