Pandas 如何循环遍历 MultiIndex 系列 [英] Pandas how to loop through a MultiIndex series
问题描述
我有一个像这样的 MultiIndex 系列:
I have a MultiIndex series look like this:
user_id cookie browser
1 1_1 [chrome45]
2 2_1 [IE 7]
2 2_2 [IE 7, IE 8]
这个 MultiIndex 有两个级别,user_id
和 cookie
.值是浏览器.
There are two levels to this MultiIndex, user_id
and cookie
. The value is the browser.
我想要做的是计算用户使用不同浏览器的次数.
What I want to do is to count the number of times a user uses a different browser.
因此对于本例中的用户 1,他只使用了 1 个浏览器.但是对于用户2,他用了三个浏览器(IE7在不同的cookies下出现了两次,所以我算了两次而不是一次)
So for user 1 in this case, he only used 1 browser. But for user 2, he used three browsers (IE7 appeared twice under different cookies, so I count it twice instead of once)
我怎样才能遍历它并得到这样的结果:
How can I loop through it and get a result like this:
r = defaultdict(int)
for user_id in multiIndex_series:
for cookie in multiIndex_series[user_id]:
r[user_id] += len(multiIndex_series[user_id][cookie]) # I don't know how to get user_id out of the MultiIndex series
推荐答案
您可以使用 groupby
带有应用 lambda 函数,其中获取扁平 lists
的 length
- 参见 回答了解更多信息:
You can use groupby
with apply lambda function where get length
of flatten lists
- see answer for more info:
df = pd.DataFrame({'user_id':[1,2,2],
'cookie':['1_1','2_1','2_2'],
'browser':[['chrome45'],['IE 7'],['IE 7','IE 8']]})
df = df.set_index(['user_id','cookie'])
print (df)
browser
user_id cookie
1 1_1 [chrome45]
2 2_1 [IE 7]
2_2 [IE 7, IE 8]
from itertools import chain
print (df.groupby(level='user_id')['browser']
.apply(lambda x: len(list(chain.from_iterable(x)))))
user_id
1 1
2 3
Name: browser, dtype: int64
代替 lambda
可以使用自定义函数 f
什么是更好的测试方法:
Instead lambda
is possible use custom function f
what is better way for testing:
def f(x):
print (list(chain.from_iterable(x)))
return len(list(chain.from_iterable(x)))
['chrome45']
['IE 7', 'IE 7', 'IE 8']
print (df.groupby(level='user_id')['browser'].apply(f))
user_id
1 1
2 3
Name: browser, dtype: int64
<小时>
如果需要串联循环,一种可能的解决方案是:
If need loop in series, one possible solution is:
for user_id, val in df['browser'].iteritems():
print (user_id)
print (val)
['chrome45']
(2, '2_1')
['IE 7']
(2, '2_2')
['IE 7', 'IE 8']
这篇关于Pandas 如何循环遍历 MultiIndex 系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!