dtype和pandas.read_csv中的转换器有什么区别? [英] What's the difference between dtype and converters in pandas.read_csv?

查看:310
本文介绍了dtype和pandas.read_csv中的转换器有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pandas函数read_csv()读取.csv文件。它的文档是此处

pandas function read_csv() reads a .csv file. Its documentation is here

根据文档,我们知道:


dtype:类型名称或列的dict->类型,默认无用于数据或列的数据类型
。例如。 {'a':np.float64,'b':np.int32}
(不支持engine ='python')

dtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’)


转换器:dict,默认值None转换某些列中
值的函数的字典。键可以是整数,也可以是列
标签

converters : dict, default None Dict of functions for converting values in certain columns. Keys can either be integers or column labels

使用此功能时,我可以调用
pandas.read_csv('file',dtype = object) pandas.read_csv('file',converters = object)。显然,converter,其名称可以表示将转换数据类型,但我想知道dtype是什么情况?

When using this function, I can call either pandas.read_csv('file',dtype=object) or pandas.read_csv('file',converters=object). Obviously, converter, its name can says that data type will be converted but I wonder the case of dtype?

推荐答案

语义差异是 dtype 允许您指定如何将值视为数字或字符串类型。

The semantic difference is that dtype allows you to specify how to treat the values, for example, either as numeric or string type.

Converters允许您解析输入数据,以使用转换函数将其转换为所需的dtype,例如,将字符串值解析为datetime或其他一些所需的dtype。

Converters allows you to parse your input data to convert it to a desired dtype using a conversion function, e.g, parsing a string value to datetime or to some other desired dtype.

我们在这里看到熊猫试图嗅探类型:

Here we see that pandas tries to sniff the types:

In [2]:
df = pd.read_csv(io.StringIO(t))
t="""int,float,date,str
001,3.31,2015/01/01,005"""
df = pd.read_csv(io.StringIO(t))
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int      1 non-null int64
float    1 non-null float64
date     1 non-null object
str      1 non-null int64
dtypes: float64(1), int64(2), object(1)
memory usage: 40.0+ bytes

从上面可以看到 001 005 被视为 int64 ,但日期字符串保持为 str

You can see from the above that 001 and 005 are treated as int64 but the date string stays as str.

如果我们说一切都是对象,那么本质上一切都是 str

If we say everything is object then essentially everything is str:

In [3]:    
df = pd.read_csv(io.StringIO(t), dtype=object).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int      1 non-null object
float    1 non-null object
date     1 non-null object
str      1 non-null object
dtypes: object(4)
memory usage: 40.0+ bytes

在这里,我们将 int 列强制为 str 并告诉 parse_dates 使用date_parser来分析日期列:

Here we force the int column to str and tell parse_dates to use the date_parser to parse the date column:

In [6]:
pd.read_csv(io.StringIO(t), dtype={'int':'object'}, parse_dates=['date']).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int      1 non-null object
float    1 non-null float64
date     1 non-null datetime64[ns]
str      1 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(1), object(1)
memory usage: 40.0+ bytes

同样,我们可以将传递给date_time 函数来转换日期:

Similarly we could've pass the to_datetime function to convert the dates:

In [5]:
pd.read_csv(io.StringIO(t), converters={'date':pd.to_datetime}).info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 4 columns):
int      1 non-null int64
float    1 non-null float64
date     1 non-null datetime64[ns]
str      1 non-null int64
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 40.0 bytes

这篇关于dtype和pandas.read_csv中的转换器有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆