解析"NA".读取 pandas 数据框中的NaN值的条目 [英] Parsing "NA" entries as NaN values when reading in a pandas dataframe

查看:184
本文介绍了解析"NA".读取 pandas 数据框中的NaN值的条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是熊猫新手.我已经使用pandas.read_csv加载了csv.我尝试不指定dtype,但是速度太慢.由于它是一个非常大的文件,因此我还指定了数据类型.但是,有时在数字列中,它包含"NA".我已经使用过na_values = ['NA'],会影响我的数据框吗?我仍然想保留这些行.我的问题是,如果我指定数据类型并添加na_values = ['NA'],是否会丢掉NA?如果是,我如何保持相似的处理时间而又不丢失这些na?非常感谢!

i am new to pandas. I have loaded csv using pandas.read_csv. i have tried not to specify dtype but it was way too slow. since it is a very large file, i also specified data type. however, sometimes in numeric columns, it contains "NA". i have used na_values = ['NA'], will it affect my data frame? i still want to preserve these rows. my question is if i specify data type and add na_values = ['NA'], will NA be tossed away? if yes, how can i maintain similar process time without losing these na? thank you very much!

推荐答案

来自 pd.read_csv 文档:

na_values:标量,strlistdict,默认为None

na_values : scalar, str, list-like, or dict, default None

其他 识别为NA/NaN的字符串.如果dict通过,则特定的每列NA 价值观.默认情况下,以下值解释为NaN:", ... 不适用" ,...`.

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ... ‘NA’, ...`.

强调粗体.这些值不会被丢弃,而是被转换为NaN.熊猫足够聪明,可以自动识别这些值,而无需您明确声明.

Bold emphasis mine. These values are not tossed away, rather, they are converted to NaN. Pandas is smart enough to automatically recognise those values without you explicitly stating it.

这篇关于解析"NA".读取 pandas 数据框中的NaN值的条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆