Pandas:ValueError:无法将浮点 NaN 转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

查看:55
本文介绍了Pandas:ValueError:无法将浮点 NaN 转换为整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到 ValueError: cannot convert float NaN to integer 以下内容:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)

  • x"显然是 csv 文件中的一列,但我无法在文件中发现任何 float NaN,并且不明白这是什么意思.
  • 当我将列读取为字符串时,它的值类似于 -1,0,1,...2000,对我来说,所有的整数都非常漂亮.
  • 当我将列读取为浮点数时,就可以加载它了.然后它显示值为 -1.0,0.0 等,仍然没有任何 NaN-s
  • 我尝试在 read_csv 中使用 error_bad_lines = False 和 dtype 参数但无济于事.它只是以相同的异常取消加载.
  • 文件不小(10+ M 行),因此无法手动检查它,当我提取一个小的标题部分时,然后没有错误,但它发生在完整的文件中.所以它在文件中,但无法检测到什么.
  • 从逻辑上讲,csv 不应该有缺失值,但即使有一些垃圾,我也可以跳过这些行.或者至少识别它们,但我看不到扫描文件和报告转换错误的方法.
    • The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
    • When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
    • When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
    • I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
    • The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
    • Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
    • 更新:使用评论/答案中的提示,我用这个清理了我的数据:

      Update: Using the hints in comments/answers I got my data clean with this:

      # x contained NaN
      df = df[~df['x'].isnull()]
      
      # Y contained some other garbage, so null check was not enough
      df = df[df['y'].str.isnumeric()]
      
      # final conversion now worked
      df[['x']] = df[['x']].astype(int)
      df[['y']] = df[['y']].astype(int)
      

      推荐答案

      为了识别 NaN 值,请使用 布尔索引:

      For identifying NaN values use boolean indexing:

      print(df[df['x'].isnull()])
      

      然后删除所有非数字值使用 to_numeric 带有参数 errors='coerce' - 将非数字值替换为 NaNs:

      Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

      df['x'] = pd.to_numeric(df['x'], errors='coerce')
      

      并且要删除 x 列中带有 NaN 的所有行,请使用 dropna:

      And for remove all rows with NaNs in column x use dropna:

      df = df.dropna(subset=['x'])
      

      最后将值转换为 ints:

      df['x'] = df['x'].astype(int)
      

      这篇关于Pandas:ValueError:无法将浮点 NaN 转换为整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆