pandas :ValueError:无法将float NaN转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

查看：709 发布时间：2020/5/23 21:40:34 python pandas csv

本文介绍了 pandas :ValueError:无法将float NaN转换为整数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到 ValueError:无法将float NaN转换为整数，原因如下:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)

"x"显然是csv文件中的一列，但我无法在文件中发现任何 float NaN ，因此无法理解其含义.
当我将列读为String时，它的值就像-1,0,1，... 2000，对我来说所有的int值都很好.
当我将列读为float时，则可以加载该列.然后它显示为-1.0,0.0等值，仍然没有任何NaN-s
我尝试使用 error_bad_lines = False 和read_csv中的dtype参数无效.它只是取消加载，但有相同的例外.
文件不小(10 + M行)，因此无法手动检查它，当我提取一个小的标题部分时，没有错误，但是在完整文件中会发生.因此它是文件中的内容，但无法检测到什么.
从逻辑上讲，csv不应缺少值，但是即使有一些垃圾，我也可以跳过行.或者至少可以识别它们，但是我看不到扫描文件和报告转换错误的方法.

The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

更新:使用注释/答案中的提示，我可以通过以下方法使数据干净:

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

推荐答案

要标识NaN值，请使用

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()])

然后使用 to_numeric 使用参数errors='coerce'-将非数字替换为NaN s:

Then for remove all not numeric values use to_numeric with parameetr errors='coerce' - it replace non numeric to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

要删除列x中所有带有NaN的行，请使用 dropna :

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x'])

最后将值转换为int s:

df['x'] = df['x'].astype(int)

这篇关于 pandas :ValueError:无法将float NaN转换为整数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas :ValueError:无法将float NaN转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :ValueError:无法将float NaN转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭