为什么 pandas 字符串系列为len()函数返回NaN? [英] Why does pandas string series return NaN for len() function?
问题描述
我正在使用Pandas中的功耗数据集,该数据集包括ZIP列作为列,但是此列的数据类型是原始CSV文件中的整数.我想将此列更改为字符串/对象数据类型,这是我到目前为止所做的:
I am working with a power consumption dataset in Pandas that includes ZIP codes as a column, but the datatype for this column is an integer in the original CSV file. I'd like to change this column to a string/object datatype, and here's what I've done so far:
df = pd.read_csv('...kWh_consumption_by_ZIP.csv')
df.head()
生成的数据帧头如下所示:
The resulting dataframe head looks like this:
如上所述,当我检查df.dtypes
时,我看到ZIP列为 int64 数据类型,因此我运行以下代码来覆盖现有系列并将其更改为 object 数据类型:
As mentioned above, when I check df.dtypes
, I see that ZIP is listed as int64 data type, so I run the following code to overwrite the existing series and change it to an object data type:
df['ZIP'] = df.ZIP.astype(object)
当我检查df.ZIP
系列时,一切看起来都很不错(至少,用肉眼看起来还不错):
Everything looks good when I check the df.ZIP
series (at least, it looks good to the naked eye):
但是当我使用len函数检查系列中每一行的长度时:
But when I check the length of each row in the series using the len function:
df.ZIP.str.len()
...生成的系列仅返回每一行的NaN(请参见下面的屏幕截图).
...the resulting series just returns NaN for each row (see screenshot below).
有人知道为什么会这样吗?预先感谢您的帮助.
Does anyone know why this is this happening? Thanks in advance for the help.
推荐答案
TL; DR
您有一列整数,并且强制转换为对象尚未解决您的问题.相反,将类型转换为str
,您应该会很好.
TL;DR
You have a column of integers, and casting to object has not solved your problem. Instead, typecast to str
and you should be good.
df.ZIP.astype(str).str.len()
由于某些原因,pandas在object
列上支持str
访问器.因为object
列可以包含任何对象,所以pandas不做任何假设.如果对象是字符串或任何有效的容器,则返回有效的结果.否则,NaN
.
For some reason, pandas supports the str
accessor on object
columns. Because object
columns can contain any object, and pandas makes no assumptions. If the object is a string or any valid container, a valid result is returned. Otherwise, NaN
.
这是一个例子:
x = [{'a': 1}, 'abcde', None, 123, 45, [1, 2, 3, 4]]
y = pd.Series(x)
y
0 {'a': 1}
1 abcde
2 None
3 123
4 45
5 [1, 2, 3, 4]
dtype: object
y.str.len()
Out[741]:
0 1.0
1 5.0
2 NaN
3 NaN
4 NaN
5 4.0
dtype: float64
对比:
y = pd.Series([1, 2, 3, 4, 5])
y
0 1
1 2
2 3
3 4
4 5
dtype: int64
y.dtype
dtype('int64')
y.str.len()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-744-acc1c109a4a4> in <module>()
----> 1 y.str.len()
y.astype(object).str.len()
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
这篇关于为什么 pandas 字符串系列为len()函数返回NaN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!