pandas :当单元格包含列表时,如何获取单元格中唯一值的数量? [英] Pandas : how to get the unique number of values in cells when cells contain lists?
问题描述
出于某种神秘的原因,我有一个看起来像这样的数据框
index col_weird col_normal
2012-01-01 14:30 ['A','B'] 2
2012-01-01 14:32 ['A','C','D'] 4
2012-01-01 14:36 ['C','D'] 2
2012-01-01 14:39 ['E','B'] 4
2012-01-01 14:40 ['G','H'] 2
我想每5分钟重新采样一次数据框,并且
-
获取
col_weird
, 中所有列表中元素的唯一数量
-
获取
col_normal
的平均值
当然,使用resample().col_weird.nunique()
可能无法完成第一个任务,因为我想要元素的唯一数量:也就是说,在14:30
和14:35
之间,我希望该数量为4,对应于A,B,光盘.
在同一时期,col_normal
的平均值当然是3.</p>
有什么主意吗?
谢谢!
我认为您可以先将list
扩展到Series
:
df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1)
print (df)
2012-01-01 14:30 A
2012-01-01 14:30 B
2012-01-01 14:32 A
2012-01-01 14:32 C
2012-01-01 14:32 D
2012-01-01 14:36 C
2012-01-01 14:36 D
2012-01-01 14:39 E
2012-01-01 14:39 B
2012-01-01 14:40 G
2012-01-01 14:40 H
dtype: object
然后使用resample
:
df = df.resample('1H').nunique()
print (df)
2012-01-01 14:00:00 7
Freq: H, dtype: int64
For some mysterious reason I have a dataframe that looks like
index col_weird col_normal
2012-01-01 14:30 ['A','B'] 2
2012-01-01 14:32 ['A','C','D'] 4
2012-01-01 14:36 ['C','D'] 2
2012-01-01 14:39 ['E','B'] 4
2012-01-01 14:40 ['G','H'] 2
I would like to resample my dataframe every 5 minutes, and
get the unique number of elements across all the lists in
col_weird
,get the mean of
col_normal
Of course, using resample().col_weird.nunique()
would fail for the first task because I want the unique number of elements: that is, between 14:30
and 14:35
I expect this number to be 4, corresponding to A,B,C,D.
Over the same period, the mean of col_normal
is of course 3.
Any idea how to get that?
Thanks!
I think you can expand list
to Series
first:
df = df['col'].apply(pd.Series).stack().reset_index(drop=True, level=1)
print (df)
2012-01-01 14:30 A
2012-01-01 14:30 B
2012-01-01 14:32 A
2012-01-01 14:32 C
2012-01-01 14:32 D
2012-01-01 14:36 C
2012-01-01 14:36 D
2012-01-01 14:39 E
2012-01-01 14:39 B
2012-01-01 14:40 G
2012-01-01 14:40 H
dtype: object
Then use resample
:
df = df.resample('1H').nunique()
print (df)
2012-01-01 14:00:00 7
Freq: H, dtype: int64
这篇关于 pandas :当单元格包含列表时,如何获取单元格中唯一值的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!