为pandas.read_csv指定正确的dtypes作为日期时间和布尔值 [英] Specify correct dtypes to pandas.read_csv for datetimes and booleans

查看:363
本文介绍了为pandas.read_csv指定正确的dtypes作为日期时间和布尔值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将一个csv文件加载到Pandas DataFrame中.对于每一列,如何使用dtype参数指定其包含的数据类型?

I am loading a csv file into a Pandas DataFrame. For each column, how do I specify what type of data it contains using the dtype argument?

  • 我可以使用数字数据(底部代码)...
  • 但是我该如何指定时间数据...
  • 类别数据(例如因子或布尔值)?我尝试了np.bool_pd.tslib.Timestamp却没有运气.
  • I can do it with numeric data (code at bottom)...
  • But how do I specify time data...
  • and categorical data such as factors or booleans? I have tried np.bool_ and pd.tslib.Timestamp without luck.

代码:

import pandas as pd
import numpy as np
df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})

推荐答案

read_csv有很多选项,可以处理您提到的所有情况.您可能想尝试dtype = {'A':datetime.datetime},但由于熊猫可以推断出类型,所以通常不需要dtype.

There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types.

对于日期,则需要指定parse_date选项:

parse_dates : boolean, list of ints or names, list of lists, or dict
keep_date_col : boolean, default False
date_parser : function

通常,要转换布尔值,您需要指定:

true_values  : list  Values to consider as True
false_values : list  Values to consider as False

这会将列表中的任何值转换为布尔值true/false.对于更一般的转换,您很可能需要

Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need

转换器:字典.用于在某些列中转换值的可选函数Dict.键可以是整数或列标签

converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels

尽管密集,请在此处查看完整列表:

Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

这篇关于为pandas.read_csv指定正确的dtypes作为日期时间和布尔值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆