pandas 将字符串转换为int [英] Pandas convert string to int
问题描述
我有一个带有ID号的大型数据框:
I have a large dataframe with ID numbers:
ID.head()
Out[64]:
0 4806105017087
1 4806105017087
2 4806105017087
3 4901295030089
4 4901295030089
这些现在都是字符串.
我想不使用循环就转换为int
-为此,我使用ID.astype(int)
.
I want to convert to int
without using loops - for this I use ID.astype(int)
.
问题是我的某些行包含脏数据,例如,这些数据无法转换为int
The problem is that some of my lines contain dirty data which cannot be converted to int
, for e.g.
ID[154382]
Out[58]: 'CN414149'
如何(不使用循环)删除这些类型的事件,以便我可以安心使用astype
?
How can I (without using loops) remove these type of occurrences so that I can use astype
with peace of mind?
推荐答案
You need add parameter errors='coerce'
to function to_numeric
:
ID = pd.to_numeric(ID, errors='coerce')
如果ID
是列:
df.ID = pd.to_numeric(df.ID, errors='coerce')
但非数字转换为NaN
,因此所有值均为float
.
but non numeric are converted to NaN
, so all values are float
.
对于int
,需要将NaN
转换为某个值,例如0
,然后转换为int
:
For int
need convert NaN
to some value e.g. 0
and then cast to int
:
df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
示例:
df = pd.DataFrame({'ID':['4806105017087','4806105017087','CN414149']})
print (df)
ID
0 4806105017087
1 4806105017087
2 CN414149
print (pd.to_numeric(df.ID, errors='coerce'))
0 4.806105e+12
1 4.806105e+12
2 NaN
Name: ID, dtype: float64
df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64)
print (df)
ID
0 4806105017087
1 4806105017087
2 0
如果使用0.25+的熊猫,则可以使用 integer_na
:
If use pandas 0.25+ then is possible use integer_na
:
df.ID = pd.to_numeric(df.ID, errors='coerce').astype('Int64')
print (df)
ID
0 4806105017087
1 4806105017087
2 NaN
这篇关于 pandas 将字符串转换为int的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!