pandas :ValueError:无法将float NaN转换为整数 [英] Pandas: ValueError: cannot convert float NaN to integer

查看:709
本文介绍了 pandas :ValueError:无法将float NaN转换为整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到 ValueError:无法将float NaN转换为整数,原因如下:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)

  • "x"显然是csv文件中的一列,但我无法在文件中发现任何 float NaN ,因此无法理解其含义.
  • 当我将列读为String时,它的值就像-1,0,1,... 2000,对我来说所有的int值都很好.
  • 当我将列读为float时,则可以加载该列.然后它显示为-1.0,0.0等值,仍然没有任何NaN-s
  • 我尝试使用 error_bad_lines = False 和read_csv中的dtype参数无效.它只是取消加载,但有相同的例外.
  • 文件不小(10 + M行),因此无法手动检查它,当我提取一个小的标题部分时,没有错误,但是在完整文件中会发生.因此它是文件中的内容,但无法检测到什么.
  • 从逻辑上讲,csv不应缺少值,但是即使有一些垃圾,我也可以跳过行.或者至少可以识别它们,但是我看不到扫描文件和报告转换错误的方法.
    • The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
    • When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
    • When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
    • I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
    • The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
    • Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
    • 更新:使用注释/答案中的提示,我可以通过以下方法使数据干净:

      Update: Using the hints in comments/answers I got my data clean with this:

      # x contained NaN
      df = df[~df['x'].isnull()]
      
      # Y contained some other garbage, so null check was not enough
      df = df[df['y'].str.isnumeric()]
      
      # final conversion now worked
      df[['x']] = df[['x']].astype(int)
      df[['y']] = df[['y']].astype(int)
      

      推荐答案

      要标识NaN值,请使用

      For identifying NaN values use boolean indexing:

      print(df[df['x'].isnull()])
      

      然后使用 to_numeric 使用参数errors='coerce'-将非数字替换为NaN s:

      Then for remove all not numeric values use to_numeric with parameetr errors='coerce' - it replace non numeric to NaNs:

      df['x'] = pd.to_numeric(df['x'], errors='coerce')
      

      要删除列x中所有带有NaN的行,请使用 dropna :

      And for remove all rows with NaNs in column x use dropna:

      df = df.dropna(subset=['x'])
      

      最后将值转换为int s:

      df['x'] = df['x'].astype(int)
      

      这篇关于 pandas :ValueError:无法将float NaN转换为整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆