如何在 pandas 数据框中用NaN替换所有非数字条目? [英] How to replace all non-numeric entries with NaN in a pandas dataframe?
问题描述
我有各种csv文件,并将它们作为DataFrame导入.问题在于许多文件使用不同的符号表示缺少的值.一些使用nan,其他使用NaN,ND,无,丢失等,或者只是将条目保留为空.有没有办法用np.nan替换所有这些值?换句话说,数据帧中的任何非数字值都将变为np.nan.谢谢您的帮助.
I have various csv files and I import them as a DataFrame. The problem is that many files use different symbols for missing values. Some use nan, others NaN, ND, None, missing etc. or just live the entry empty. Is there a way to replace all these values with a np.nan? In other words, any non-numeric value in the dataframe becomes np.nan. Thank you for the help.
推荐答案
我发现我认为是一种相对优雅但又健壮的方法:
I found what I think is a relatively elegant but also robust method:
def isnumber(x):
try:
float(x)
return True
except:
return False
df[df.applymap(isnumber)]
如果不清楚:您定义一个仅在将任何输入都可以转换为浮点数的情况下才返回True
的函数.然后,您使用该布尔数据框过滤df
,该布尔数据框会自动将NaN
分配给您未过滤的单元格.
In case it's not clear: You define a function that returns True
only if whatever input you have can be converted to a float. You then filter df
with that boolean dataframe, which automatically assigns NaN
to the cells you didn't filter for.
我尝试的另一种解决方案是将isnumber
定义为
Another solution I tried was to define isnumber
as
import number
def isnumber(x):
return isinstance(x, number.Number)
但是我最不喜欢这种方法的地方是,您可能会意外地将数字作为字符串,因此您会错误地将其过滤掉.这也是一个偷偷摸摸的错误,因为数据框显示的字符串"99"
与数字99
相同.
but what I liked less about that approach is that you can accidentally have a number as a string, so you would mistakenly filter those out. This is also a sneaky error, seeing that the dataframe displays the string "99"
the same as the number 99
.
在您的情况下,由于float
适用于'nan'
的所有不同大小写,因此您可能仍需要在过滤后使用df = df.applymap(float)
,但是在您明确转换它们之前,它们仍将被视为数据帧中的字符串.
In your case you probably still need to df = df.applymap(float)
after filtering, for the reason that float
works on all different capitalizations of 'nan'
, but until you explicitely convert them they will still be considered strings in the dataframe.
这篇关于如何在 pandas 数据框中用NaN替换所有非数字条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!