dtype和pandas.read_csv中的转换器有什么区别? [英] What's the difference between dtype and converters in pandas.read_csv?
问题描述
pandas函数read_csv()读取.csv文件。它的文档是此处
pandas function read_csv() reads a .csv file. Its documentation is here
根据文档,我们知道:
dtype:类型名称或列的dict->类型,默认无用于数据或列的数据类型
。例如。 {'a':np.float64,'b':np.int32}
(不支持engine ='python')
dtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’)
和
转换器:dict,默认值None转换某些列中
值的函数的字典。键可以是整数,也可以是列
标签
converters : dict, default None Dict of functions for converting values in certain columns. Keys can either be integers or column labels
使用此功能时,我可以调用
pandas.read_csv('file',dtype = object)
或 pandas.read_csv('file',converters = object)
。显然,converter,其名称可以表示将转换数据类型,但我想知道dtype是什么情况?
When using this function, I can call either
pandas.read_csv('file',dtype=object)
or pandas.read_csv('file',converters=object)
. Obviously, converter, its name can says that data type will be converted but I wonder the case of dtype?
推荐答案
语义差异是 dtype
允许您指定如何将值视为数字或字符串类型。
The semantic difference is that dtype
allows you to specify how to treat the values, for example, either as numeric or string type.
Converters允许您解析输入数据,以使用转换函数将其转换为所需的dtype,例如,将字符串值解析为datetime或其他一些所需的dtype。
Converters allows you to parse your input data to convert it to a desired dtype using a conversion function, e.g, parsing a string value to datetime or to some other desired dtype.
我们在这里看到熊猫试图嗅探类型:
Here we see that pandas tries to sniff the types:
In [2]:
df = pd.read_csv(io.StringIO(t))
t="""int,float,date,str
001,3.31,2015/01/01,005"""
df = pd.read_csv(io.StringIO(t))
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int 1 non-null int64
float 1 non-null float64
date 1 non-null object
str 1 non-null int64
dtypes: float64(1), int64(2), object(1)
memory usage: 40.0+ bytes
从上面可以看到 001
和 005
被视为 int64
,但日期字符串保持为 str
。
You can see from the above that 001
and 005
are treated as int64
but the date string stays as str
.
如果我们说一切都是对象
,那么本质上一切都是 str
:
If we say everything is object
then essentially everything is str
:
In [3]:
df = pd.read_csv(io.StringIO(t), dtype=object).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int 1 non-null object
float 1 non-null object
date 1 non-null object
str 1 non-null object
dtypes: object(4)
memory usage: 40.0+ bytes
在这里,我们将 int
列强制为 str
并告诉 parse_dates
使用date_parser来分析日期列:
Here we force the int
column to str
and tell parse_dates
to use the date_parser to parse the date column:
In [6]:
pd.read_csv(io.StringIO(t), dtype={'int':'object'}, parse_dates=['date']).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int 1 non-null object
float 1 non-null float64
date 1 non-null datetime64[ns]
str 1 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(1), object(1)
memory usage: 40.0+ bytes
同样,我们可以将传递给date_time
函数来转换日期:
Similarly we could've pass the to_datetime
function to convert the dates:
In [5]:
pd.read_csv(io.StringIO(t), converters={'date':pd.to_datetime}).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int 1 non-null int64
float 1 non-null float64
date 1 non-null datetime64[ns]
str 1 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 40.0 bytes
这篇关于dtype和pandas.read_csv中的转换器有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!