Pandas:ValueError:无法将浮点 NaN 转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

查看：55 发布时间：2021/12/9 14:26:46 python pandas csv

本文介绍了Pandas:ValueError:无法将浮点 NaN 转换为整数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到 ValueError: cannot convert float NaN to integer 以下内容:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)

x"显然是 csv 文件中的一列，但我无法在文件中发现任何 float NaN，并且不明白这是什么意思.
当我将列读取为字符串时，它的值类似于 -1,0,1,...2000，对我来说，所有的整数都非常漂亮.
当我将列读取为浮点数时，就可以加载它了.然后它显示值为 -1.0,0.0 等，仍然没有任何 NaN-s
我尝试在 read_csv 中使用 error_bad_lines = False 和 dtype 参数但无济于事.它只是以相同的异常取消加载.
文件不小(10+ M 行)，因此无法手动检查它，当我提取一个小的标题部分时，然后没有错误，但它发生在完整的文件中.所以它在文件中，但无法检测到什么.
从逻辑上讲，csv 不应该有缺失值，但即使有一些垃圾，我也可以跳过这些行.或者至少识别它们，但我看不到扫描文件和报告转换错误的方法.

The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

更新:使用评论/答案中的提示，我用这个清理了我的数据:

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

推荐答案

为了识别 NaN 值，请使用 布尔索引:

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()])

然后删除所有非数字值使用 to_numeric 带有参数 errors='coerce' - 将非数字值替换为 NaNs:

Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

并且要删除 x 列中带有 NaN 的所有行，请使用 dropna:

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x'])

最后将值转换为 ints:

df['x'] = df['x'].astype(int)

这篇关于Pandas:ValueError:无法将浮点 NaN 转换为整数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas:ValueError:无法将浮点 NaN 转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas:ValueError:无法将浮点 NaN 转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭