为什么 pandas 字符串系列为len()函数返回NaN? [英] Why does pandas string series return NaN for len() function?

查看:33
本文介绍了为什么 pandas 字符串系列为len()函数返回NaN?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Pandas中的功耗数据集,该数据集包括ZIP列作为列,但是此列的数据类型是原始CSV文件中的整数.我想将此列更改为字符串/对象数据类型,这是我到目前为止所做的:

I am working with a power consumption dataset in Pandas that includes ZIP codes as a column, but the datatype for this column is an integer in the original CSV file. I'd like to change this column to a string/object datatype, and here's what I've done so far:

df = pd.read_csv('...kWh_consumption_by_ZIP.csv')
df.head()

生成的数据帧头如下所示:

The resulting dataframe head looks like this:

如上所述,当我检查df.dtypes时,我看到ZIP列为 int64 数据类型,因此我运行以下代码来覆盖现有系列并将其更改为 object 数据类型:

As mentioned above, when I check df.dtypes, I see that ZIP is listed as int64 data type, so I run the following code to overwrite the existing series and change it to an object data type:

df['ZIP'] = df.ZIP.astype(object)

当我检查df.ZIP系列时,一切看起来都很不错(至少,用肉眼看起来还不错):

Everything looks good when I check the df.ZIP series (at least, it looks good to the naked eye):

但是当我使用len函数检查系列中每一行的长度时:

But when I check the length of each row in the series using the len function:

df.ZIP.str.len()

...生成的系列仅返回每一行的NaN(请参见下面的屏幕截图).

...the resulting series just returns NaN for each row (see screenshot below).

有人知道为什么会这样吗?预先感谢您的帮助.

Does anyone know why this is this happening? Thanks in advance for the help.

推荐答案

TL; DR

您有一列整数,并且强制转换为对象尚未解决您的问题.相反,将类型转换为str,您应该会很好.

TL;DR

You have a column of integers, and casting to object has not solved your problem. Instead, typecast to str and you should be good.

df.ZIP.astype(str).str.len()


由于某些原因,pandas在object列上支持str访问器.因为object列可以包含任何对象,所以pandas不做任何假设.如果对象是字符串或任何有效的容器,则返回有效的结果.否则,NaN.


For some reason, pandas supports the str accessor on object columns. Because object columns can contain any object, and pandas makes no assumptions. If the object is a string or any valid container, a valid result is returned. Otherwise, NaN.

这是一个例子:

x = [{'a': 1}, 'abcde', None, 123, 45, [1, 2, 3, 4]]
y = pd.Series(x)

y

0        {'a': 1}
1           abcde
2            None
3             123
4              45
5    [1, 2, 3, 4]
dtype: object

y.str.len()
Out[741]: 
0    1.0
1    5.0
2    NaN
3    NaN
4    NaN
5    4.0
dtype: float64

对比:

y = pd.Series([1, 2, 3, 4, 5])
y

0    1
1    2
2    3
3    4
4    5
dtype: int64

y.dtype
dtype('int64')

y.str.len()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-744-acc1c109a4a4> in <module>()
----> 1 y.str.len()

y.astype(object).str.len()

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
dtype: float64

这篇关于为什么 pandas 字符串系列为len()函数返回NaN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆