DataFrame 中的字符串,但 dtype 是对象 [英] Strings in a DataFrame, but dtype is object

查看:40
本文介绍了DataFrame 中的字符串,但 dtype 是对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么 Pandas 告诉我我有对象,尽管所选列中的每个项目都是一个字符串——即使在显式转换之后也是如此.

Why does Pandas tell me that I have objects, although every item in the selected column is a string — even after explicit conversion.

这是我的数据帧:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56992 entries, 0 to 56991
Data columns (total 7 columns):
id            56992  non-null values
attr1         56992  non-null values
attr2         56992  non-null values
attr3         56992  non-null values
attr4         56992  non-null values
attr5         56992  non-null values
attr6         56992  non-null values
dtypes: int64(2), object(5)

其中五个是dtype object.我明确地将这些对象转换为字符串:

Five of them are dtype object. I explicitly convert those objects to strings:

for c in df.columns:
    if df[c].dtype == object:
        print "convert ", df[c].name, " to string"
        df[c] = df[c].astype(str)

那么,df["attr2"] 仍然有 dtype 对象,虽然 type(df["attr2"].ix[0]> 显示 str,这是正确的.

Then, df["attr2"] still has dtype object, although type(df["attr2"].ix[0] reveals str, which is correct.

Pandas 区分 int64float64object.当没有 dtype str 时,它背后的逻辑是什么?为什么 strobject 覆盖?

Pandas distinguishes between int64 and float64 and object. What is the logic behind it when there is no dtype str? Why is a str covered by object?

推荐答案

dtype 对象来自 NumPy,它描述了 ndarray 中元素的类型.ndarray 中的每个元素都必须具有相同的字节大小.对于 int64float64,它们是 8 个字节.但是对于字符串,字符串的长度是不固定的.因此,Pandas 并没有直接将字符串的字节保存在 ndarray 中,而是使用了一个对象 ndarray,它保存了指向对象的指针;因此,这种 ndarraydtype 是对象.

The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in an ndarray must have the same size in bytes. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of saving the bytes of strings in the ndarray directly, Pandas uses an object ndarray, which saves pointers to objects; because of this the dtype of this kind ndarray is object.

这是一个例子:

  • int64 数组包含 4 个 int64 值.
  • 对象数组包含 4 个指向 3 个字符串对象的指针.

这篇关于DataFrame 中的字符串,但 dtype 是对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆