使用pandas.read_csv指定正确的dtypes [英] Specify correct dtypes using pandas.read_csv

查看:3432
本文介绍了使用pandas.read_csv指定正确的dtypes的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个csv文件加载到Pandas DataFrame。我如何为每列指定它包含什么类型的数据?



我想这可以很容易地通过使用 dtype argument



这里是一个指定数字数据的示例。

  import pandas as pd 
import numpy as np
df = pd.read_csv(< file-name>,dtype = {'A':np.int64,'B ':np.float64})

但是如何指定时间数据和分类数据(如因素或布尔值)?我试过了 np.bool _ pd.tslib.Timestamp 没有运气。


< div_type =h2_lin>解决方案

read_csv有很多选项可以处理你提到的所有情况。你可能想尝试dtype = {'A':datetime.datetime},但通常你不需要dtypes,因为pandas可以推断类型。但是如果你有日期,那么你需要指定parse_date选项。

  parse_dates:boolean,ints或names列表的列表或dict 
keep_date_col:boolean,default False
date_parser:function

一般来说,要转换布尔值,您需要指定:

  true_values:list要考虑的值True 
false_values :list考虑的值为False

这会将列表中的任何值转换为boolean true / false 。对于更一般的转换,您很可能需要



converters:dict。可选用于转换某些列中的值的函数的说明。键可以是整数或列标签



虽然密集,请在此查看完整列表: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html


I would like to load a csv file into a Pandas DataFrame. How do I for each column specify what type of data it contains?

I guess this is easily done by using the dtype argument?

Here is an example specifying numeric data.

import pandas as pd
import numpy as np
df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})

But how do I specify time data and categorical data such as factors or booleans? I have tried np.bool_ and pd.tslib.Timestamp without luck.

解决方案

There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types. But if you have dates, then you need to specify the parse_date options.

parse_dates : boolean, list of ints or names, list of lists, or dict
keep_date_col : boolean, default False
date_parser : function

In general to convert boolean values you will need to specify:

true_values : list    Values to consider as True
false_values : list Values to consider as False

Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need

converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels

Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

这篇关于使用pandas.read_csv指定正确的dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆