pandas 识别的所有 dtypes 是什么? [英] what are all the dtypes that pandas recognizes?
问题描述
对于熊猫,有没有人知道,除了
(i) float64
、int64
(以及 np.number
的其他变体,如 float32
、int8
等)
(ii) bool
(iii) datetime64
, timedelta64
比如字符串列,总是有一个 dtype
的 object
?
或者,我想知道,上面列表中是否有除 (i)、(ii) 和 (iii) 之外的任何数据类型,pandas
不会使其成为 dtype
一个 object
?
EDIT Feb 2020 after pandas 1.0.0 release
Pandas 主要为每个系列使用 NumPy 数组和数据类型(数据帧是系列的集合,每个系列都可以有自己的数据类型).NumPy 的文档进一步解释了 dtype,数据类型和数据类型对象.此外,@lcameron05 提供的答案提供了对 numpy dtypes 的出色描述.此外,dtypes 上的熊猫文档有很多额外的信息.
<块引用>pandas 对象中存储的主要类型有 float、int、bool、datetime64[ns]、timedelta[ns] 和对象.另外这些dtypes有项目大小,例如int64 和 int32.
<块引用>
默认整数类型是 int64,浮点类型是 float64,与平台无关(32 位或 64 位).以下都将导致 int64 dtypes.
Numpy,但是在创建时会选择平台相关的类型数组.以下将导致 32 位平台上的 int32.pandas 1.0.0 版本的主要变化之一是引入了 pd.NA
来表示标量缺失值(而不是之前的 np.nan
、pd.NaT
或 None
,取决于使用情况).
Pandas 扩展了 NumPy 的类型系统,还允许用户在 扩展类型.下面列出了所有的 Pandas 扩展类型.
数据类型:tz-aware datetime(注意 NumPy 不支持时区感知的日期时间).
数据类型:DatetimeTZDtype一个>
标量:时间戳
字符串别名:'datetime64[ns, ]'
数据类型:分类
数据类型:CategoricalDtype一个>
标量:(无)
数组:分类
字符串别名:'类别'
数据类型:期间(时间跨度)
数据类型:PeriodDtype一个>
标量:Period
字符串别名:'period[]', 'Period[]'
数据类型:稀疏
数据类型:SparseDtype一个>
标量:(无)
字符串别名:'Sparse'、'Sparse[int]'、'Sparse[float]'
数据类型:区间
数据类型:IntervalDtype一个>
标量:间隔
字符串别名:'interval'、'Interval'、'Interval[
数据类型:可为空的整数
数据类型:Int64Dtype, ...
标量:(无)
字符串别名:'Int8'、'Int16'、'Int32'、'Int64'、'UInt8'、'UInt16'、'UInt32'、'UInt64'
数据类型:字符串
数据类型:StringDtype一个>
标量:str
字符串别名:'字符串'
数据类型:布尔型(带NA)
数据类型:BooleanDtype一个>
标量:bool
字符串别名:'boolean'
For pandas, would anyone know, if any datatype apart from
(i) float64
, int64
(and other variants of np.number
like float32
, int8
etc.)
(ii) bool
(iii) datetime64
, timedelta64
such as string columns, always have a dtype
of object
?
Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandas
does not make it's dtype
an object
?
EDIT Feb 2020 following pandas 1.0.0 release
Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypes have a lot of additional information.
The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.
By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.
Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of
pd.NA
to represent scalar missing values (rather than the previous values ofnp.nan
,pd.NaT
orNone
, depending on usage).
Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.
Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).
Data type: DatetimeTZDtype
Scalar: Timestamp
Array: arrays.DatetimeArray
String Aliases: 'datetime64[ns, ]'
Kind of data: Categorical
Data type: CategoricalDtype
Scalar: (none)
Array: Categorical
String Aliases: 'category'
Kind of data: period (time spans)
Data type: PeriodDtype
Scalar: Period
Array: arrays.PeriodArray
String Aliases: 'period[]', 'Period[]'
Kind of data: sparse
Data type: SparseDtype
Scalar: (none)
Array: arrays.SparseArray
String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'
Kind of data: intervals
Data type: IntervalDtype
Scalar: Interval
Array: arrays.IntervalArray
String Aliases: 'interval', 'Interval', 'Interval[<numpy_dtype>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'
Kind of data: nullable integer
Data type: Int64Dtype, ...
Scalar: (none)
Array: arrays.IntegerArray
String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'
Kind of data: Strings
Data type: StringDtype
Scalar: str
Array: arrays.StringArray
String Aliases: 'string'
8) Boolean data with missing values
Kind of data: Boolean (with NA)
Data type: BooleanDtype
Scalar: bool
Array: arrays.BooleanArray
String Aliases: 'boolean'
这篇关于 pandas 识别的所有 dtypes 是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!