使用pandas.read_csv指定正确的dtypes [英] Specify correct dtypes using pandas.read_csv
问题描述
我想将一个csv文件加载到Pandas DataFrame。我如何为每列指定它包含什么类型的数据?
我想这可以很容易地通过使用 dtype
argument
这里是一个指定数字数据的示例。
import pandas as pd
import numpy as np
df = pd.read_csv(< file-name>,dtype = {'A':np.int64,'B ':np.float64})
但是如何指定时间数据和分类数据(如因素或布尔值)?我试过了 np.bool _
和 pd.tslib.Timestamp
没有运气。
< div_type =h2_lin>解决方案
read_csv有很多选项可以处理你提到的所有情况。你可能想尝试dtype = {'A':datetime.datetime},但通常你不需要dtypes,因为pandas可以推断类型。但是如果你有日期,那么你需要指定parse_date选项。
parse_dates:boolean,ints或names列表的列表或dict
keep_date_col:boolean,default False
date_parser:function
一般来说,要转换布尔值,您需要指定:
true_values:list要考虑的值True
false_values :list考虑的值为False
这会将列表中的任何值转换为boolean true / false 。对于更一般的转换,您很可能需要
converters:dict。可选用于转换某些列中的值的函数的说明。键可以是整数或列标签
虽然密集,请在此查看完整列表: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html
I would like to load a csv file into a Pandas DataFrame. How do I for each column specify what type of data it contains?
I guess this is easily done by using the dtype
argument?
Here is an example specifying numeric data.
import pandas as pd
import numpy as np
df = pd.read_csv(<file-name>, dtype={'A': np.int64, 'B': np.float64})
But how do I specify time data and categorical data such as factors or booleans? I have tried np.bool_
and pd.tslib.Timestamp
without luck.
There are a lot of options for read_csv which will handle all the cases you mentioned. You might want to try dtype={'A': datetime.datetime}, but often you won't need dtypes as pandas can infer the types. But if you have dates, then you need to specify the parse_date options.
parse_dates : boolean, list of ints or names, list of lists, or dict
keep_date_col : boolean, default False
date_parser : function
In general to convert boolean values you will need to specify:
true_values : list Values to consider as True
false_values : list Values to consider as False
Which will transform any value in the list to the boolean true/false. For more general conversions you will most likely need
converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels
Though dense, check here for the full list: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html
这篇关于使用pandas.read_csv指定正确的dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!