pandas read_csv,读取具有指定为int的缺失值的布尔值 [英] Pandas read_csv, reading a boolean with missing values specified as an int

查看:63
本文介绍了 pandas read_csv,读取具有指定为int的缺失值的布尔值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将csv导入到pandas数据框中.我有用1和0表示的布尔变量,其中缺失值用-9标识.当我尝试将dtype指定为boolean时,会遇到很多不同的错误,具体取决于我的尝试.

I am trying to import a csv into a pandas dataframe. I have boolean variables denoted with 1's and 0's, where missing values are identified with a -9. When I try to specify the dtype as boolean, I get a host of different errors, depending on what I try.

样本数据:test.csv

Sample data: test.csv

var1, var2
0,   0
0,   1
1,   3
-9,  0
0,   2
1,   7

我尝试在导入时指定dtype:

I try to specify the dtype as I import:

dtype_dict = {'var1':'bool','var2':'int'}
nan_dict = {'var1':[-9]}
foo = pd.read_csv('test.csv',dtype=dtype_dict, na_values=nan_dict)

我收到以下错误:

ValueError:无法安全地将传递的| b1的用户dtype转换为int64第0列中的dtyped数据

ValueError: cannot safely convert passed user dtype of |b1 for int64 dtyped data in column 0

我也尝试过指定true和false值

I have also tried specifying the true and false values,

foo = pd.read_csv('test.csv',dtype=dtype_dict,na_values=nan_dict,
                 true_values=[1],false_values=[0])

但是然后我得到了另一个错误:

but then I get a different error:

例外:必须是所有编码的字节

Exception: Must be all encoded bytes

该错误的源代码说明了有关偶尔捕获一个错误的信息,但是我真正想要的是null或null.

The source code for the error says something about catching the occasional none, but nones or nulls are exactly what I want.

推荐答案

您可以为 var1 列指定 converters 参数:

from io import StringIO
import numpy as np
import pandas as pd

pd.read_csv(StringIO("""var1, var2
0,   0
0,   1
1,   3
-9,  0
0,   2
1,   7"""), converters = {'var1': lambda x: bool(int(x)) if x != '-9' else np.nan})

这篇关于 pandas read_csv,读取具有指定为int的缺失值的布尔值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆