在Pandas中如何根据列的值对多索引的一个级别进行排序,同时保持另一级别的分组 [英] In Pandas How to sort one level of a multi-index based on the values of a column, while maintaining the grouping of the other level

查看:88
本文介绍了在Pandas中如何根据列的值对多索引的一个级别进行排序,同时保持另一级别的分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在正在大学上一门数据挖掘课程,但是我对多索引排序问题有些犹豫.

I'm taking a Data Mining course at university right now, but I'm a wee bit stuck on a multi-index sorting problem.

实际数据涉及大约100万部电影评论,而我正尝试根据美国邮政编码进行分析,但为了测试我想做的事情,我一直在使用较小的数据集我正在使用年龄段来随机生成10部电影的250个评分,而不是邮政编码.

The actual data involves about 1 million reviews of movies, and I'm trying to analyze that based on American zip codes, but to test out how to do what I want, I've been using a much smaller data set of 250 randomly generated ratings for 10 movies and instead of zip codes, I'm using age groups.

这就是我现在所拥有的,它是Pandas中的多索引DataFrame,具有两个级别:组"和标题"

So this is what I have right now, it's a multiindexed DataFrame in Pandas with two levels, 'group' and 'title'

                        rating
group       title   
            Alien       4.000000
            Argo        2.166667
Adults      Ben-Hur     3.666667
            Gandhi      3.200000
            ...         ...

            Alien       3.000000
            Argo        3.750000
Coeds       Ben-Hur     3.000000
            Gandhi      2.833333
            ...         ...

            Alien       2.500000
            Argo        2.750000
Kids        Ben-Hur     3.000000
            Gandhi      3.200000
            ...         ...

我的目标是根据组中的标题对标题进行排序(并且仅在每个组中显示最受欢迎的5个左右的标题)

What I'm aiming for is to sort the titles based on their rating within the group (and only show the most popular 5 or so titles within each group)

是这样的(但我只会在每个组中显示两个标题):

So something like this (but I'm only going to show two titles in each group):

                        rating
group       title   
            Alien       4.000000
Adults      Ben-Hur     3.666667

            Argo        3.750000
Coeds       Alien       3.000000

            Gandhi      3.200000
Kids        Ben-Hur     3.000000

有人知道该怎么做吗?我尝试过sort_order,sort_index等并交换级别,但它们也将组混合在一起.因此,它看起来像:

Anyone know how to do this? I've tried sort_order, sort_index, etc and swapping the levels, but they mix up the groups too. So it then looks like:

                          rating
group         title 
Adults        Alien      4.000000
Coeds         Argo       3.750000
Adults        Ben-Hur    3.666667
Kids          Gandhi     3.666667
Coeds         Alien      3.000000
Kids          Ben-Hur    3.000000

我正在寻找类似这样的东西:在熊猫中进行多索引排序,但我不想基于其他级别进行排序,而是希望基于值进行排序.有点像那个人想根据他的销售栏来排序.

I'm kind of looking for something like this: Multi-Index Sorting in Pandas, but instead of sorting based on another level, I want to sort based on the values. Kind of like if that person wanted to sort based on his sales column.

谢谢!

推荐答案

您正在寻找注意;这可以就地工作(即修改s),以使用订单:

Note; this works inplace (i.e. modifies s), to return a copy use order:

In [14]: s.order()
Out[14]: 
1  3    1
2  1    2
1  1    3
dtype: int64

更新:我意识到您的实际要求,我认为这应该是排序级别的一个选项,但是现在我认为您必须reset_index,groupby并应用:

Update: I realised what you were actually asking, and I think this ought to be an option in sortlevels, but for now I think you have to reset_index, groupby and apply:

In [21]: s.reset_index(name='s').groupby('level_0').apply(lambda s: s.sort('s')).set_index(['level_0', 'level_1'])['s']
Out[21]: 
level_0  level_1
1        3          1
         1          3
2        1          2
Name: 0, dtype: int64

注意:之后,您可以将级别名称设置为[None,None].

这篇关于在Pandas中如何根据列的值对多索引的一个级别进行排序,同时保持另一级别的分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆