使用pandas将字符串对象转换为int/float [英] Converting string objects to int/float using pandas

查看:649
本文介绍了使用pandas将字符串对象转换为int/float的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import pandas as pd

path1 = "/home/supertramp/Desktop/100&life_180_data.csv"

mydf =  pd.read_csv(path1)

numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}

print mydf['Cigarettes']

mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)

print mydf['CigarNum']

mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')

csv文件"100& life_180_data.csv"包含年龄,bmi,香烟,酒精等列.

The csv file "100&life_180_data.csv" contains columns like age, bmi,Cigarettes,Alocohol etc.

No                int64
Age               int64
BMI             float64
Alcohol          object
Cigarettes       object
dtype: object

香烟"列包含从不","1-5香烟/天","10-20香烟/天". 我想为这些对象分配权重(从不,每天1-5支香烟,....)

Cigarettes column contains "Never" "1-5 Cigarettes/day","10-20 Cigarettes/day". I want to assign weights to these object (Never,1-5 Cigarettes/day ,....)

期望的输出是附加的CigarNum新列,该列仅包含数字0,1,2 CigarNum可以预期到8行,然后在CigarNum列中显示Nan到最后一行

The expected output is new column CigarNum appended which consists only numbers 0,1,2 CigarNum is as expected till 8 rows and then shows Nan till last row in CigarNum column

0                     Never
1                     Never
2        1-5 Cigarettes/day
3                     Never
4                     Never
5                     Never
6                     Never
7                     Never
8                     Never
9                     Never
10                    Never
11                    Never
12     10-20 Cigarettes/day
13       1-5 Cigarettes/day
14                    Never
...
167                    Never
168                    Never
169     10-20 Cigarettes/day
170                    Never
171                    Never
172                    Never
173                    Never
174                    Never
175                    Never
176                    Never
177                    Never
178                    Never
179                    Never
180                    Never
181                    Never
Name: Cigarettes, Length: 182, dtype: object

我得到的输出应该在前几行之后不给出NaN.

The output I get shoudln't give NaN after few first rows.

0      0
1      0
2      1
3      0
4      0
5      0
6      0
7      0
8      0
9      0
10   NaN
11   NaN
12   NaN
13   NaN
14     0
...
167   NaN
168   NaN
169   NaN
170   NaN
171   NaN
172   NaN
173   NaN
174   NaN
175   NaN
176   NaN
177   NaN
178   NaN
179   NaN
180   NaN
181   NaN
Name: CigarNum, Length: 182, dtype: float64

推荐答案

好的,第一个问题是您嵌入了空格,导致该函数无法正确应用:

OK, first problem is you have embedded spaces causing the function to incorrectly apply:

使用矢量化的str修复此问题:

fix this using vectorised str:

mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')

现在创建您的新列就可以了:

now create your new column should just work:

mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)

更新

一如既往,感谢@Jeff指出了做事的上乘方式:

Thanks to @Jeff as always for pointing out superior ways to do things:

因此您可以致电replace而不是致电apply:

So you can call replace instead of calling apply:

mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)

您也可以使用factorize方法.

考虑一下,为什么不将dict值设置为浮点数,然后又避免类型转换呢?

Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?

所以:

numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}

版本0.17.0或更高版本

convert_objects,已将其替换为

convert_objects is deprecated since 0.17.0, this has been replaced with to_numeric

mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')

此处errors='coerce'将返回NaN,其中无法将值转换为数字值,否则将引发异常

Here errors='coerce' will return NaN where the values cannot be converted to a numeric value, without this it will raise an exception

这篇关于使用pandas将字符串对象转换为int/float的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆