pandas :当单元格包含列表时,如何获取单元格中唯一值的数量? [英] Pandas : how to get the unique number of values in cells when cells contain lists?

查看:68
本文介绍了 pandas :当单元格包含列表时,如何获取单元格中唯一值的数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于某种神秘的原因,我有一个看起来像这样的数据框

index             col_weird      col_normal
2012-01-01 14:30  ['A','B']      2
2012-01-01 14:32  ['A','C','D']  4
2012-01-01 14:36  ['C','D']      2
2012-01-01 14:39  ['E','B']      4
2012-01-01 14:40  ['G','H']      2

我想每5分钟重新采样一次数据框,并且

  • 获取col_weird

  • 中所有列表中元素的唯一数量
  • 获取col_normal

  • 的平均值

当然,使用resample().col_weird.nunique()可能无法完成第一个任务,因为我想要元素的唯一数量:也就是说,在14:3014:35之间,我希望该数量为4,对应于A,B,光盘.

在同一时期,col_normal的平均值当然是3.<​​/p>

有什么主意吗?

谢谢!

解决方案

我认为您可以先将list扩展到Series:

df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1)
print (df)
2012-01-01 14:30    A
2012-01-01 14:30    B
2012-01-01 14:32    A
2012-01-01 14:32    C
2012-01-01 14:32    D
2012-01-01 14:36    C
2012-01-01 14:36    D
2012-01-01 14:39    E
2012-01-01 14:39    B
2012-01-01 14:40    G
2012-01-01 14:40    H
dtype: object

然后使用resample:

df = df.resample('1H').nunique()
print (df)
2012-01-01 14:00:00    7
Freq: H, dtype: int64

For some mysterious reason I have a dataframe that looks like

index             col_weird      col_normal
2012-01-01 14:30  ['A','B']      2
2012-01-01 14:32  ['A','C','D']  4
2012-01-01 14:36  ['C','D']      2
2012-01-01 14:39  ['E','B']      4
2012-01-01 14:40  ['G','H']      2

I would like to resample my dataframe every 5 minutes, and

  • get the unique number of elements across all the lists in col_weird,

  • get the mean of col_normal

Of course, using resample().col_weird.nunique() would fail for the first task because I want the unique number of elements: that is, between 14:30 and 14:35 I expect this number to be 4, corresponding to A,B,C,D.

Over the same period, the mean of col_normal is of course 3.

Any idea how to get that?

Thanks!

解决方案

I think you can expand list to Series first:

df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1)
print (df)
2012-01-01 14:30    A
2012-01-01 14:30    B
2012-01-01 14:32    A
2012-01-01 14:32    C
2012-01-01 14:32    D
2012-01-01 14:36    C
2012-01-01 14:36    D
2012-01-01 14:39    E
2012-01-01 14:39    B
2012-01-01 14:40    G
2012-01-01 14:40    H
dtype: object

Then use resample:

df = df.resample('1H').nunique()
print (df)
2012-01-01 14:00:00    7
Freq: H, dtype: int64

这篇关于 pandas :当单元格包含列表时,如何获取单元格中唯一值的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆