如何在从Pandas的CSV读取整数时优雅地回退到“NaN”值？ [英] How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

查看：2281 发布时间：2017/2/24 22:48:31 python csv pandas data-processing

本文介绍了如何在从Pandas的CSV读取整数时优雅地回退到“NaN”值？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当使用Pandas使用 read_csv 时，如果我想将给定的列转换为类型，格式不正确的值将中断整个操作，值

While using read_csv with Pandas, if i want a given column to be converted to a type, a malformed value will interrupt the whole operation, without an indication about the offending value.

例如，运行类似于：

import pandas as pd
import numpy as np


df = pd.read_csv('my.csv', dtype={ 'my_column': np.int64 })

将导致以错误结尾的堆栈跟踪：

Will lead to a stack trace ending with the error:

ValueError: cannot safely convert passed user dtype of <i8 for object dtyped data in column ...

如果我有错误消息中的行号或错误值，我可以将其添加到已知 NaN 值的列表

If i had the row number, or the offending value in the error message, i could add it to the list of known NaN values, but this way there is nothing i can do.

有没有办法让解析器忽略失败并返回一个 np.nan 在这种情况下？

Is there a way to tell the parser to ignore failures and return a np.nan in that case?

Post Scriptum：有趣的是，解析后没有任何类型建议（没有 dtype 参数）， d ['my_column'] .value_counts（）似乎推断 dtype np.nan ，即使该系列的实际 dtype 是一个通用的对象


Post Scriptum: Funnily enough, after parsing without any type suggestion (no dtype argument), d['my_column'].value_counts() seems to infer the dtype right and put np.nan correctly automatically, even though the actual dtype for the series is a generic object which will fail on almost every plotting and statistical operation
推荐答案

由于我的意见，我意识到， a href =http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na =nofollow>整数没有NaN，是非常令人惊讶的我。因此，我切换到转换为float：

Thanks to the comments i realised that there is no NaN for integers, which was very surprising to me. Thus i switched to converting to float:

import pandas as pd
import numpy as np


df = pd.read_csv('my.csv', dtype={ 'my_column': np.float64 })

这给了我一个可以理解的错误消息与失败转换的值，所以我可以添加失败的值到 na_values ：

This gave me an understandable error message with the value of the failing conversion, so that i could add the failing value to the na_values:

df = pd.read_csv('my.csv', dtype={ 'my_column': np.float64 }, na_values=['n/a'])

这种方式我最终可以导入CSV的方式，可视化和统计功能：

This way i could finally import the CSV in a way which works with visualisation and statistical functions:

>>>> df['session_planned_os'].dtype
dtype('float64')

能够找到正确的 na_values ，您可以从 read_csv dtype c $ c>。类型推断现在会正确执行：

Once you are able to spot the right na_values, you can remove the dtype argument from read_csv. Type inference will now happen correctly:

df = pd.read_csv('my.csv', na_values=['n/a'])

这篇关于如何在从Pandas的CSV读取整数时优雅地回退到“NaN”值？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在从Pandas的CSV读取整数时优雅地回退到“NaN”值？ [英] How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在从Pandas的CSV读取整数时优雅地回退到“NaN”值？ [英] How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭