pandas :ValueError:无法将float NaN转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer
本文介绍了 pandas :ValueError:无法将float NaN转换为整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我得到 ValueError:无法将float NaN转换为整数,原因如下:
df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)
- "x"显然是csv文件中的一列,但我无法在文件中发现任何 float NaN ,因此无法理解其含义.
- 当我将列读为String时,它的值就像-1,0,1,... 2000,对我来说所有的int值都很好.
- 当我将列读为float时,则可以加载该列.然后它显示为-1.0,0.0等值,仍然没有任何NaN-s
- 我尝试使用 error_bad_lines = False 和read_csv中的dtype参数无效.它只是取消加载,但有相同的例外.
- 文件不小(10 + M行),因此无法手动检查它,当我提取一个小的标题部分时,没有错误,但是在完整文件中会发生.因此它是文件中的内容,但无法检测到什么.
- 从逻辑上讲,csv不应缺少值,但是即使有一些垃圾,我也可以跳过行.或者至少可以识别它们,但是我看不到扫描文件和报告转换错误的方法.
- The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
- When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
- When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
- I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
- The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
- Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
更新:使用注释/答案中的提示,我可以通过以下方法使数据干净:
Update: Using the hints in comments/answers I got my data clean with this:
# x contained NaN
df = df[~df['x'].isnull()]
# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]
# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)
推荐答案
For identifying NaN
values use boolean indexing
:
print(df[df['x'].isnull()])
然后使用 to_numeric
使用参数errors='coerce'
-将非数字替换为NaN
s:
Then for remove all not numeric values use to_numeric
with parameetr errors='coerce'
- it replace non numeric to NaN
s:
df['x'] = pd.to_numeric(df['x'], errors='coerce')
要删除列x
中所有带有NaN
的行,请使用 dropna
:
And for remove all rows with NaN
s in column x
use dropna
:
df = df.dropna(subset=['x'])
最后将值转换为int
s:
df['x'] = df['x'].astype(int)
这篇关于 pandas :ValueError:无法将float NaN转换为整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文