使用pandas的read_csv时设置特定列的数据类型 [英] Set data type for specific column when using read_csv from pandas

查看：463 发布时间：2020/5/24 1:03:11 python pandas

本文介绍了使用pandas的read_csv时设置特定列的数据类型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的csv文件(〜10GB)，大约有4000列.我知道我期望的大多数数据都是int8，所以我设置了:

I have a large csv file (~10GB), with around 4000 columns. I know that most of data i will expect is int8, so i set:

pandas.read_csv('file.dat', sep=',', engine='c', header=None, 
                na_filter=False, dtype=np.int8, low_memory=False)

问题是，最后一列(第4000个位置)是int32，我是否可以告诉read_csv默认使用int8并在第4000列使用int 32?

Thing is, the final column (4000th position) is int32, is there away can i tell read_csv that use int8 by default, and at column 4000th, use int 32?

谢谢

推荐答案

如果确定数字，可以重新创建字典，如下所示:

If you are certain of the number you could recreate the dictionary like this:

dtype = dict(zip(range(4000),['int8' for _ in range(3999)] + ['int32']))

考虑到这可行:

import pandas as pd
import numpy as np

data = '''\
1,2,3
4,5,6'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, dtype={0:'int8',1:'int8',2:'int32'}, header=None)

print(df.dtypes)

0     int8
1     int8
2    int32
dtype: object

从文档中

dtype:类型名称或列的字典->类型，默认为无

dtype : Type name or dict of column -> type, default None

数据或列的数据类型.例如. {‘a’:np.float64，‘b’:np.int32} 使用str或object保留而不解释dtype.如果转换器指定后，它们将应用于dtype转换的INSTEAD.

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use str or object to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

这篇关于使用pandas的read_csv时设置特定列的数据类型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用pandas的read_csv时设置特定列的数据类型 [英] Set data type for specific column when using read_csv from pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用pandas的read_csv时设置特定列的数据类型 [英] Set data type for specific column when using read_csv from pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭