pandas .apply()函数中的异常处理 [英] Exception Handling in Pandas .apply() function
问题描述
如果我有一个DataFrame:
If I have a DataFrame:
myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])
给出以下数据框(从stackoverflow开始,并且对于该DataFrame的图像没有足够的信誉)
Gives the following dataframe (Starting out on stackoverflow and don't have enough reputation for an image of the DataFrame)
| A | B |
0 | 11 | 11 |
1 | 22 | 2A |
2 | 33 | 33 |
如果我想将B列转换为int值并删除无法转换的值,我必须这样做:
If i want to convert column B to int values and drop values that can't be converted I have to do:
def convertToInt(cell):
try:
return int(cell)
except:
return None
myDF['B'] = myDF['B'].apply(convertToInt)
如果我只做:
myDF ['B'].apply(int)
myDF['B'].apply(int)
错误显然是:
C:\ WinPython-32bit-2.7.5.3 \ python-2.7.5 \ lib \ site-packages \ pandas \ lib.pyd在pandas.lib.map_infer(pandas \ lib.c:42840)()
C:\WinPython-32bit-2.7.5.3\python-2.7.5\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:42840)()
ValueError:以10为底的int()无效文字:'2A'
ValueError: invalid literal for int() with base 10: '2A'
是否可以将异常处理添加到myDF ['B'].apply()
Is there a way to add exception handling to myDF['B'].apply()
提前谢谢!
推荐答案
做得更好/更快:
In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])
In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]:
A B
0 11 11
1 22 NaN
2 33 33
[3 rows x 2 columns]
In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]:
A int64
B float64
dtype: object
这是执行此操作的向量化方法. coerce
标志说可以将所有无法转换为数字的内容标记为 nan
.
This is a vectorized method of doing just this. The coerce
flag say to mark as nan
anything that cannot be converted to numeric.
当然,您可以根据需要在单个列中执行此操作.
You can of course do this to a single column if you'd like.
这篇关于 pandas .apply()函数中的异常处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!