pandas read_csv dtype指定除一列外的所有列 [英] Pandas read_csv dtype specify all columns but one

查看:81
本文介绍了 pandas read_csv dtype指定除一列外的所有列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件.我想将大多数值读取为字符串,但是如果存在具有给定标题的列,则希望将其读取为bool.

I've a CSV file. Most of it's values I want to read as string, but I want to read a column as bool if the column with the given title exists..

由于CSV文件中包含很多列,因此我不想在每列上直接指定数据类型并给出如下内容:

Because the CSV file has a lots of columns, I don't want to specify on each column the datatype directly and give something like this:

data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool})

是否可以在每个列上定义一个字符串类型,但同时将一个可选列读为布尔值?

Is it possible to define the string type on each column but one and read an optional column as a bool at the same time?

我当前的解决方法如下(但是效率很低而且很慢):

My current solution is the following (but it's very unefficient and slow):

data = read_csv('sample.csv', dtype=str) # reads all column as string
if 'X' in data.columns:
    l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None
    data['X'] = data.apply(l, axis=1)

更新: CSV样本:

A;B;C;X
a1;b1;c1;True
a2;b2;c2;False
a3;b3;c3;True

或者没有'X'列也可以(因为该列是可选的):

Or the same can ba without the 'X' column (because the column is optional):

A;B;C
a1;b1;c1
a2;b2;c2
a3;b3;c3

推荐答案

您可以先过滤列 replace :

You can first filter columns contains value X with boolean indexing and then replace:

cols = df.columns[df.columns.str.contains('X')]
df[cols] = df[cols].replace({'True': True, 'False': False})

或者如果需要过滤列X:

cols = df.columns[df.columns == 'X']
df[cols] = df[cols].replace({'True': True, 'False': False})

示例:

import pandas as pd

df = pd.DataFrame({'A':['a1','a2','a3'],
                   'B':['b1','b2','b3'],
                   'C':['c1','c2','c3'],
                   'X':['True','False','True']})

print (df)
    A   B   C      X
0  a1  b1  c1   True
1  a2  b2  c2  False
2  a3  b3  c3   True

print (df.dtypes)
A    object
B    object
C    object
X    object
dtype: object

cols = df.columns[df.columns.str.contains('X')]
print (cols)

Index(['X'], dtype='object')

df[cols] = df[cols].replace({'True': True, 'False': False})

print (df.dtypes)
A    object
B    object
C    object
X      bool
dtype: object
print (df)

    A   B   C      X
0  a1  b1  c1   True
1  a2  b2  c2  False
2  a3  b3  c3   True

这篇关于 pandas read_csv dtype指定除一列外的所有列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆