向 Pandas DataFrame 添加新列导致 NaN [英] Adding new column to pandas DataFrame results in NaN

查看:61
本文介绍了向 Pandas DataFrame 添加新列导致 NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有以下交易数据的 Pandas DataFrame data:

I have a pandas DataFrame data with the following transaction data:

           A         date
0      M000833  2016-08-01
1      M000833  2016-08-01
2      M000833  2016-08-02
3      M000833  2016-08-02 
4      M000511  2016-08-05

我想要一个新列,其中包含每个消费者的访问次数(每天多次访问应视为 1).

I want a new column with the count of number of visits (multiple visits per day should be treated as 1) per consumer.

所以我尝试了这个:

import pandas as pd
data['noofvisits'] = data.groupby(['A'])['date'].nunique()

当我只运行该语句而不将其分配给 DataFrame 时,我会得到一个带有所需输出的 ​​Pandas 系列.但是,上述语句导致:

When I just run the statement without assigning it to the DataFrame, I get a pandas series with the desired output. However, the above statement result in:

           A         date       noofvisits
0      M000833  2016-08-01         NaN         
1      M000833  2016-08-01         NaN
2      M000833  2016-08-02         NaN
3      M000833  2016-08-02         NaN
4      M000511  2016-08-05         NaN

预期输出为:

           A         date       noofvisits
0      M000833  2016-08-01         2         
1      M000833  2016-08-01         2
2      M000833  2016-08-02         2
3      M000833  2016-08-02         2
4      M000511  2016-08-05         1

这种方法有什么问题?为什么 noofvisits 列的结果是 NAs 而不是计数值?

What is wrong with this approach? Why does the column noofvisits results in NAs rather than the count values?

推荐答案

使用 transform 生成一个 Series,它的索引与原始 df 对齐:

Use transform to generate a Series with it's index aligned to the original df:

In[32]:
df['noofvisits'] = df.groupby(['A'])['date'].transform('nunique')
df

Out[32]: 
             A        date  noofvisits
index                                 
0      M000833  2016-08-01           2
1      M000833  2016-08-01           2
2      M000833  2016-08-02           2
3      M000833  2016-08-02           2
4      M000511  2016-08-05           1

直接分配的问题是你在 'A' 列上 grouping 所以这成为 groupby 聚合的索引,然后您尝试分配给您的 df 但索引不一致,因此 NaN 列值.

The problem with direct assigning is that you're grouping on column 'A' so this becomes the index of the groupby aggregation, you then try to assign to your df but the indices don't agree hence the NaN column values.

此外,即使索引值确实一致,形状仍然不同:

Also even if the index values did agree the shape is different anyway:

In[33]:
df.groupby(['A'])['date'].nunique()

Out[33]: 
A
M000511    1
M000833    2
Name: date, dtype: int64

这篇关于向 Pandas DataFrame 添加新列导致 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆