DataFrame 中的字符串,但 dtype 是对象 [英] Strings in a DataFrame, but dtype is object
问题描述
为什么 Pandas 告诉我我有对象,尽管所选列中的每个项目都是一个字符串——即使在显式转换之后也是如此.
Why does Pandas tell me that I have objects, although every item in the selected column is a string — even after explicit conversion.
这是我的数据帧:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 56992 entries, 0 to 56991
Data columns (total 7 columns):
id 56992 non-null values
attr1 56992 non-null values
attr2 56992 non-null values
attr3 56992 non-null values
attr4 56992 non-null values
attr5 56992 non-null values
attr6 56992 non-null values
dtypes: int64(2), object(5)
其中五个是dtype object
.我明确地将这些对象转换为字符串:
Five of them are dtype object
. I explicitly convert those objects to strings:
for c in df.columns:
if df[c].dtype == object:
print "convert ", df[c].name, " to string"
df[c] = df[c].astype(str)
那么,df["attr2"]
仍然有 dtype 对象
,虽然 type(df["attr2"].ix[0]
> 显示 str
,这是正确的.
Then, df["attr2"]
still has dtype object
, although type(df["attr2"].ix[0]
reveals str
, which is correct.
Pandas 区分 int64
和 float64
和 object
.当没有 dtype str
时,它背后的逻辑是什么?为什么 str
被 object
覆盖?
Pandas distinguishes between int64
and float64
and object
. What is the logic behind it when there is no dtype str
? Why is a str
covered by object
?
推荐答案
dtype
对象来自 NumPy,它描述了 ndarray
中元素的类型.ndarray
中的每个元素都必须具有相同的字节大小.对于 int64
和 float64
,它们是 8 个字节.但是对于字符串,字符串的长度是不固定的.因此,Pandas 并没有直接将字符串的字节保存在 ndarray
中,而是使用了一个对象 ndarray
,它保存了指向对象的指针;因此,这种 ndarray
的 dtype
是对象.
The dtype
object comes from NumPy, it describes the type of element in a ndarray
. Every element in an ndarray
must have the same size in bytes. For int64
and float64
, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of saving the bytes of strings in the ndarray
directly, Pandas uses an object ndarray
, which saves pointers to objects; because of this the dtype
of this kind ndarray
is object.
这是一个例子:
- int64 数组包含 4 个 int64 值.
- 对象数组包含 4 个指向 3 个字符串对象的指针.
这篇关于DataFrame 中的字符串,但 dtype 是对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!