如何在 pandas 数据框中用NaN替换所有非数字条目? [英] How to replace all non-numeric entries with NaN in a pandas dataframe?

查看:68
本文介绍了如何在 pandas 数据框中用NaN替换所有非数字条目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有各种csv文件,并将它们作为DataFrame导入.问题在于许多文件使用不同的符号表示缺少的值.一些使用nan,其他使用NaN,ND,无,丢失等,或者只是将条目保留为空.有没有办法用np.nan替换所有这些值?换句话说,数据帧中的任何非数字值都将变为np.nan.谢谢您的帮助.

I have various csv files and I import them as a DataFrame. The problem is that many files use different symbols for missing values. Some use nan, others NaN, ND, None, missing etc. or just live the entry empty. Is there a way to replace all these values with a np.nan? In other words, any non-numeric value in the dataframe becomes np.nan. Thank you for the help.

推荐答案

我发现我认为是一种相对优雅但又健壮的方法:

I found what I think is a relatively elegant but also robust method:

def isnumber(x):
    try:
        float(x)
        return True
    except:
        return False

df[df.applymap(isnumber)]

如果不清楚:您定义一个仅在将任何输入都可以转换为浮点数的情况下才返回True的函数.然后,您使用该布尔数据框过滤df,该布尔数据框会自动将NaN分配给您未过滤的单元格.

In case it's not clear: You define a function that returns True only if whatever input you have can be converted to a float. You then filter df with that boolean dataframe, which automatically assigns NaN to the cells you didn't filter for.

我尝试的另一种解决方案是将isnumber定义为

Another solution I tried was to define isnumber as

import number
def isnumber(x):
    return isinstance(x, number.Number)

但是我最不喜欢这种方法的地方是,您可能会意外地将数字作为字符串,因此您会错误地将其过滤掉.这也是一个偷偷摸摸的错误,因为数据框显示的字符串"99"与数字99相同.

but what I liked less about that approach is that you can accidentally have a number as a string, so you would mistakenly filter those out. This is also a sneaky error, seeing that the dataframe displays the string "99" the same as the number 99.

在您的情况下,由于float适用于'nan'的所有不同大小写,因此您可能仍需要在过滤后使用df = df.applymap(float),但是在您明确转换它们之前,它们仍将被视为数据帧中的字符串.

In your case you probably still need to df = df.applymap(float) after filtering, for the reason that float works on all different capitalizations of 'nan', but until you explicitely convert them they will still be considered strings in the dataframe.

这篇关于如何在 pandas 数据框中用NaN替换所有非数字条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆