这对于read_csv和数据值NA是否正确? [英] Is this correct behavior for read_csv and a data value of NA?

查看：122 发布时间：2020/5/24 3:21:40 python pandas

本文介绍了这对于read_csv和数据值NA是否正确?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

(我在GitHub上打开了问题.)

(I have opened an issue at GitHub.)

以下行为对我来说似乎不正确.似乎如果read_csv的默认值为na_values=False，则不应将包括"NA"在内的任何值解释为NaN，但事实并非如此.

The following behavior doesn't seem correct to me. It seems like if the default for read_csv is na_values=False then no values including 'NA' should be interpreted as NaN but this does not appear to be the case.

此行为在此帖子中已注意到(见评论) (@JianxunLi)的答案)，其中"NA"实际上是北美".实际上，如果不将其更改为NaN，我将无法找到一种方法来阅读它，并且肯定应该有某种方法可以做到这一点.

This behavior was noticed in this post (see the comments to the answer by @JianxunLi), where 'NA' actually means 'North America'. I actually am unable to find a way to read this in without having it changed to NaN and there definitely should be some way to do this.

这是csv示例.

%more foo.txt
x,y
"NA",NA
"foo",foo

我在引号和外部都添加了"NA"，以查看是否很重要，但是正如您在下面看到的那样，这似乎并不重要.

I'm including 'NA' both in quotes and outside to see if that matters, but as you can see below it doesn't seem to.

pd.read_csv('foo.txt')
Out[56]: 
     x    y
0  NaN  NaN
1  foo  foo

pd.read_csv('foo.txt',na_values=False)
Out[57]: 
     x    y
0  NaN  NaN
1  foo  foo

pd.read_csv('foo.txt',na_values='foo')
Out[58]: 
    x   y
0 NaN NaN
1 NaN NaN

似乎'NaN'的数据值与'NA'相同.

It appears that data values of 'NaN' are treated the same as 'NA'.

编辑以添加:尽管我觉得@Marius的答案似乎并不正确(默认行为，即似乎不是Marius的答案，但我认为我对@Marius的答案更了解)是对正在发生的事情的正确解释.

Edit to add: I think I am understanding this better based on @Marius's answer although it doesn't really seem right to me (the default behavior, that is, not Marius's answer which does seem to be a correct explanation of what is happening).

na_values=False    =>   NA and NaN are treated as NaN
na_values='foo'    =>   NA, NaN, and foo are treated as NaN

我想我可以理解这是数字列中的默认行为，但似乎这不是字符串列的默认行为.我也很难在没有看到Marius回答的情况下从文档中弄清楚这一点.

I guess I can understand this being default behavior in a number column but it doesn't seem like this should be the default for a string column. I also would have really struggled to figure this out from the documentation without seeing Marius's answer.

编辑以添加(2):

为了进行比较，我将其读入Stata和Excel中，并且在两种情况下都将'NA'视为纯文本，而不是NaN/missing.是否还有其他软件包或库的默认行为与此处的熊猫相同?

Also, for comparison, I read this into Stata and Excel and in both cased they treat 'NA' as plain text, not as NaN/missing. Is there any other package or library that would have the same default behavior as pandas here?

这对于read_csv和数据值NA是否正确? [英] Is this correct behavior for read_csv and a data value of NA?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

这对于read_csv和数据值NA是否正确? [英] Is this correct behavior for read_csv and a data value of NA?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭