替换 pandas 数据框的唯一值 [英] Replace unique values of pandas data-frame

查看:76
本文介绍了替换 pandas 数据框的唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python和pandas的新手.

Hi I'm new to python and pandas.

我已经使用熊猫提取了其中一列的唯一值. 现在,在获得列的唯一值(即字符串)之后.

I have extracted the unique values of one of the column using pandas. Now after getting the unique values of the column, which are string.

['Others, Senior Management-Finance, Senior Management-Sales'
  'Consulting, Strategic planning, Senior Management-Finance'
  'Client Servicing, Quality Control - Product/ Process, Strategic       
   planning'
  'Administration/ Facilities, Business Analytics, Client Servicing'
  'Sales & Marketing, Sales/ Business Development/ Account Management,    
  Sales Support']

我想用唯一的整数值替换字符串值.

I want to replace the string values with the unique integer value.

为简单起见,我可以为您提供虚拟输入和输出.

for simplicity I can give you the dummy input and output.

输入:

Col1
  A
  A
  B
  B
  B
  C
  C

唯一的df值如下所示

[ 'A' 'B' 'C' ]

替换列后,应如下图所示

after replacing the column should look like this

Col1
  1
  1
  2
  2
  2
  3
  3

请给我建议使用循环或其他任何方式的方法,因为我拥有超过300个唯一值.

Please suggest me the way how can I do it by using loop or any other way because I have more than 300 unique values.

推荐答案

使用 调整值.

另一种 numpy.unique 解决方案,但速度较慢在巨大的数​​据框中:

_,idx = np.unique(df['Col1'],return_inverse=True) 
df['Col1'] = idx + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

最后,您可以将值转换为 categorical -主要是因为内存使用情况:

Last you can convert values to categorical - mainly because less memory usage:

df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
  Col1
0    0
1    0
2    1
3    1
4    1
5    2
6    2

print (df.dtypes)
Col1    category
dtype: object

这篇关于替换 pandas 数据框的唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆