Pandas:ValueError:无法将浮点 NaN 转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer
本文介绍了Pandas:ValueError:无法将浮点 NaN 转换为整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我得到 ValueError: cannot convert float NaN to integer 以下内容:
df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
- x"显然是 csv 文件中的一列,但我无法在文件中发现任何 float NaN,并且不明白这是什么意思.
- 当我将列读取为字符串时,它的值类似于 -1,0,1,...2000,对我来说,所有的整数都非常漂亮.
- 当我将列读取为浮点数时,就可以加载它了.然后它显示值为 -1.0,0.0 等,仍然没有任何 NaN-s
- 我尝试在 read_csv 中使用 error_bad_lines = False 和 dtype 参数但无济于事.它只是以相同的异常取消加载.
- 文件不小(10+ M 行),因此无法手动检查它,当我提取一个小的标题部分时,然后没有错误,但它发生在完整的文件中.所以它在文件中,但无法检测到什么.
- 从逻辑上讲,csv 不应该有缺失值,但即使有一些垃圾,我也可以跳过这些行.或者至少识别它们,但我看不到扫描文件和报告转换错误的方法.
- The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
- When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
- When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
- I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
- The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
- Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
更新:使用评论/答案中的提示,我用这个清理了我的数据:
Update: Using the hints in comments/answers I got my data clean with this:
# x contained NaN
df = df[~df['x'].isnull()]
# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]
# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)
推荐答案
为了识别 NaN
值,请使用 布尔索引
:
For identifying NaN
values use boolean indexing
:
print(df[df['x'].isnull()])
然后删除所有非数字值使用 to_numeric
带有参数 errors='coerce'
- 将非数字值替换为 NaN
s:
Then for removing all non-numeric values use to_numeric
with parameter errors='coerce'
- to replace non-numeric values to NaN
s:
df['x'] = pd.to_numeric(df['x'], errors='coerce')
并且要删除 x
列中带有 NaN
的所有行,请使用 dropna
:
And for remove all rows with NaN
s in column x
use dropna
:
df = df.dropna(subset=['x'])
最后将值转换为 int
s:
df['x'] = df['x'].astype(int)
这篇关于Pandas:ValueError:无法将浮点 NaN 转换为整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文