pandas 如何更换?使用NaN-处理非标准缺失值 [英] Pandas How to Replace ? with NaN - handling non standard missing values

查看:53
本文介绍了 pandas 如何更换?使用NaN-处理非标准缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是pandas的新手,我正在尝试在Dataframe中加载csv.我的数据缺少表示为的值? ,而我正尝试将其替换为标准的Missing值-NaN

I am new to pandas , I am trying to load the csv in Dataframe. My data has missing values represented as ? , and I am trying to replace it with standard Missing values - NaN

请帮助我.我曾尝试阅读过Pandas文档,但我无法遵循.

Kindly help me with this . I have tried reading through Pandas docs, but I am not able to follow.

def readData(filename):
   DataLabels =["age", "workclass", "fnlwgt", "education", "education-num", "marital-status",
               "occupation", "relationship", "race", "sex", "capital-gain",
               "capital-loss", "hours-per-week", "native-country", "class"] 

   # ==== trying to replace ? with Nan using na_values
   rawfile = pd.read_csv(filename, header=None, names=DataLabels, na_values=["?"])
   age = rawfile["age"]
   print age
   print rawfile[25:40]

   #========trying to replace ?
   rawfile.replace("?", "NaN")
   print rawfile[25:40]

推荐答案

您可以使用replace将该列替换为该列:

You can replace this just for that column using replace:

df['workclass'].replace('?', np.NaN)

或整个df:

df.replace('?', np.NaN)

更新

好,我知道了您的问题,默认情况下,如果您不传递分隔符,则read_csv将使用逗号','作为分隔符.

OK I figured out your problem, by default if you don't pass a separator character then read_csv will use commas ',' as the separator.

您的数据,尤其是一行有问题的示例:

Your data and in particular one example where you have a problematic line:

54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K

实际上有一个逗号和一个空格作为分隔符,所以当您通过na_value=['?']时,它不匹配,因为所有值前面都有一个空格字符,您看不到.

has in fact a comma and a space as the separator so when you passed the na_value=['?'] this didn't match because all your values have a space character in front of them all which you can't observe.

如果将行更改为此:

rawfile = pd.read_csv(filename, header=None, names=DataLabels, sep=',\s', na_values=["?"])

然后您应该会发现一切正常:

then you should find that it all works:

27      54               NaN  180211  Some-college             10 

这篇关于 pandas 如何更换?使用NaN-处理非标准缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆