pandas 能识别出什么类型的dtypes? [英] what are all the dtypes that pandas recognizes?

查看:60
本文介绍了 pandas 能识别出什么类型的dtypes?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于大熊猫,有人知道吗,除了

For pandas, would anyone know, if any datatype apart from

(i)float64int64(以及np.number的其他变体,例如float32int8等)

(i) float64, int64 (and other variants of np.number like float32, int8 etc.)

(ii)bool

(iii)datetime64timedelta64

(例如字符串列)始终具有objectdtype?

such as string columns, always have a dtype of object ?

或者,我想知道,除了上面列表中的(i),(ii)和(iii)之外,是否还有其他数据类型,pandas不会使它成为dtypeobject? >

Alternatively, I want to know, if there are any datatype apart from (i), (ii) and (iii) in the list above that pandas does not make it's dtype an object?

推荐答案

在熊猫1.0.0版本发布后2020年2月

Pandas对于每个Series大多使用NumPy数组和dtype(数据帧是Series的集合,每个都有自己的dtype). NumPy的文档进一步说明了 dtype 数据类型 dtypes 上的熊猫文档具有一个很多其他信息.

Pandas mostly uses NumPy arrays and dtypes for each Series (a dataframe is a collection of Series, each which can have its own dtype). NumPy's documentation further explains dtype, data types, and data type objects. In addition, the answer provided by @lcameron05 provides an excellent description of the numpy dtypes. Furthermore, the pandas docs on dtypes have a lot of additional information.

存储在pandas对象中的主要类型是float,int,bool, datetime64 [ns],timedelta [ns]和对象.另外这些dtypes 具有商品尺寸,例如int64和int32.

The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.

默认情况下,整数类型为int64,浮点类型为float64, 平台无关(32位或64位).以下将全部 导致int64 dtypes.

By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.

Numpy,但是在创建时会选择依赖于平台的类型 数组.以下WILL会在32位平台上生成int32. 熊猫1.0.0版的主要更改之一是引入了pd.NA来表示标量缺失值(而不是先前的np.nanpd.NaTNone值,具体取决于用法).

Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform. One of the major changes to version 1.0.0 of pandas is the introduction of pd.NA to represent scalar missing values (rather than the previous values of np.nan, pd.NaT or None, depending on usage).

Pandas扩展了NumPy的类型系统,还允许用户在扩展类型.以下列出了所有的熊猫扩展名类型.

Pandas extends NumPy's type system and also allows users to write their on extension types. The following lists all of pandas extension types.

1)时区处理

数据种类:tz感知的日期时间(请注意,NumPy不支持时区感知的日期时间).

Kind of data: tz-aware datetime (note that NumPy does not support timezone-aware datetimes).

数据类型: DatetimeTZDtype

标量:时间戳

Array: arrays.DatetimeArray

字符串别名:'datetime64 [ns,]'

String Aliases: 'datetime64[ns, ]'

2)分类数据

数据种类:分类

数据类型: CategoricalDtype

标量:(无)

数组:类别

字符串别名:类别"

3)时间跨度表示法

数据种类:时间段(时间跨度)

Kind of data: period (time spans)

数据类型: PeriodDtype

标量:时段

Array: arrays.PeriodArray

字符串别名:'period []','Period []'

String Aliases: 'period[]', 'Period[]'

4)稀疏数据结构

数据种类:稀疏

数据类型: SparseDtype

标量:(无)

Array: arrays.SparseArray

字符串别名:'Sparse','Sparse [int]','Sparse [float]'

String Aliases: 'Sparse', 'Sparse[int]', 'Sparse[float]'

5)IntervalIndex

数据种类:时间间隔

数据类型: IntervalDtype

标量:时间间隔

Array: arrays.IntervalArray

字符串别名:"interval","Interval","Interval []","Interval [datetime64 [ns,]]","Interval [timedelta64 []]"

String Aliases: 'interval', 'Interval', 'Interval[]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'

6)可空整数数据类型

数据种类:可为空的整数

Kind of data: nullable integer

数据类型: Int64Dtype ,...

标量:(无)

Array: arrays.IntegerArray

字符串别名:'Int8','Int16','Int32','Int64','UInt8','UInt16','UInt32','UInt64'

String Aliases: 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'

7)处理文本数据

数据种类:字符串

数据类型: StringDtype

标量: str

Array: arrays.StringArray

字符串别名:字符串"

String Aliases: 'string'

8)布尔数据缺少值

数据种类:布尔值(不适用)

Kind of data: Boolean (with NA)

数据类型: BooleanDtype

标量:布尔

Array: arrays.BooleanArray

字符串别名:'boolean'

String Aliases: 'boolean'

这篇关于 pandas 能识别出什么类型的dtypes?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆