什么是dtype('O')? [英] What is dtype('O')?

查看:1451
本文介绍了什么是dtype('O')?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫中有一个数据框,我试图弄清楚它的值的类型是什么.我不确定列'Test'的类型.但是,当我运行myFrame['Test'].dtype时,我得到了;

I have a dataframe in pandas and I'm trying to figure out what the types of its values are. I am unsure what the type is of column 'Test'. However, when I run myFrame['Test'].dtype, I get;

dtype('O')

这是什么意思?

推荐答案

当您在数据框内看到dtype('O')时,表示熊猫字符串.

什么是dtype?

When you see dtype('O') inside dataframe this means Pandas string.

What is dtype?

属于pandasnumpy或两者兼有的东西?如果我们检查熊猫代码:

Something that belongs to pandas or numpy, or both, or something else? If we examine pandas code:

df = pd.DataFrame({'float': [1.0],
                    'int': [1],
                    'datetime': [pd.Timestamp('20180310')],
                    'string': ['foo']})
print(df)
print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)
df['string'].dtype

它将输出如下:

   float  int   datetime string    
0    1.0    1 2018-03-10    foo
---
float64 int64 datetime64[ns] object
---
dtype('O')

您可以将最后一个解释为Python类型的字符串Pandas dtype('O')或Pandas对象,这对应于Numpy string_unicode_类型.

You can interpret the last as Pandas dtype('O') or Pandas object which is Python type string, and this corresponds to Numpy string_, or unicode_ types.

Pandas dtype    Python type     NumPy type          Usage
object          str             string_, unicode_   Text

就像唐吉x德(Don Quixote)在屁股上,熊猫(Pandas)在Numpy上一样,Numpy了解系统的基础架构,并使用类

Like Don Quixote is on ass, Pandas is on Numpy and Numpy understand the underlying architecture of your system and uses the class numpy.dtype for that.

数据类型对象是numpy.dtype类的实例,可以理解更精确的数据类型,包括:

Data type object is an instance of numpy.dtype class that understand the data type more precise including:

  • 数据类型(整数,浮点数,Python对象等)
  • 数据大小(例如整数中的多少个字节)
  • 数据的字节顺序(小端或大端)
  • 如果数据类型是结构化的,则是其他数据类型的集合(例如,描述由整数和浮点数组成的数组项)
  • 结构的字段"的名称是什么
  • 每个字段的数据类型是什么
  • 每个字段占用存储块的哪个部分
  • 如果数据类型是子数组,它的形状和数据类型是什么

在此问题中,dtype属于pands和numpy,尤其是dtype('O')表示我们期望该字符串.

In the context of this question dtype belongs to both pands and numpy and in particular dtype('O') means we expect the string.

以下是一些测试代码,并附有说明: 如果我们将数据集作为字典

Here is some code for testing with explanation: If we have the dataset as dictionary

import pandas as pd
import numpy as np
from pandas import Timestamp

data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}
df = pd.DataFrame.from_dict(data) #now we have a dataframe

print(df)
print(df.dtypes)

最后一行将检查数据框并记录输出:

The last lines will examine the dataframe and note the output:

   id       date                  role  num   fnum
0   1 2018-12-12               Support  123   3.14
1   2 2018-12-12             Marketing  234   2.14
2   3 2018-12-12  Business Development  345  -0.14
3   4 2018-12-12                 Sales  456  41.30
4   5 2018-12-12           Engineering  567   3.14
id               int64
date    datetime64[ns]
role            object
num              int64
fnum           float64
dtype: object

各种各样的dtypes

df.iloc[1,:] = np.nan
df.iloc[2,:] = None

但是,如果我们尝试设置np.nanNone,则不会影响原始列dtype.输出将是这样的:

But if we try to set np.nan or None this will not affect the original column dtype. The output will be like this:

print(df)
print(df.dtypes)

    id       date         role    num   fnum
0  1.0 2018-12-12      Support  123.0   3.14
1  NaN        NaT          NaN    NaN    NaN
2  NaN        NaT         None    NaN    NaN
3  4.0 2018-12-12        Sales  456.0  41.30
4  5.0 2018-12-12  Engineering  567.0   3.14
id             float64
date    datetime64[ns]
role            object
num            float64
fnum           float64
dtype: object

因此,除非我们将所有列行都设置为np.nanNone,否则np.nanNone不会更改列dtype.在这种情况下,列将分别变为float64object.

So np.nan or None will not change the columns dtype, unless we set the all column rows to np.nan or None. In that case column will become float64 or object respectively.

您也可以尝试设置单行:

You may try also setting single rows:

df.iloc[3,:] = 0 # will convert datetime to object only
df.iloc[4,:] = '' # will convert all columns to object

在这里需要注意的是,如果我们在非字符串列中设置字符串,它将成为字符串或对象dtype.

And to note here, if we set string inside a non string column it will become string or object dtype.

这篇关于什么是dtype('O')?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆