显示多索引 pandas 数据框的前 10 行 [英] Show first 10 rows of multi-index pandas dataframe

查看：31 发布时间：2021/6/4 19:42:50 python pandas multi-index

本文介绍了显示多索引 pandas 数据框的前 10 行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个多级索引熊猫 DataFrame，其中第一级是 year，第二级是 username.我只有一列已经按降序排序.我想显示每个索引级别 0 的前 2 行.

I have a multilevel index pandas DataFrame where the first level is year and the second level is username. I only have one column which is already sorted in a descending manner. I want to show the first 2 rows of each index level 0.

我有什么:

               count
year username                
2010 b         677
     a         505
     c         400
     d         300
 ...
2014 a         100
     b         80

我想要的:

               count
year username                
2010 b         677
     a         505
2011 c         677
     d         505
2012 e         677
     f         505
2013 g         677
     i         505
2014 h         677
     j         505

推荐答案

这里有一个答案.也许有更好的方法来做到这一点(使用索引?)，但我认为它有效.原理看似复杂，其实很简单:

Here is an answer. Maybe there is a better way to do that (with indexing ?), but I thing it works. The principle seems complex but is quite simple:

按年份和用户名索引 DataFrame.
按年份对 DataFrame 分组，即索引的第一级 (=0)
对groupby获取的子DataFrame进行两次操作(每年一次)
- 按计数升序对索引进行排序 sort_index(by='count')-> 计数较多的行将位于 DataFrame 的尾部立>
- 使用负切片符号 ([-top:]) 仅保留最后的 top 行(在本例中为 2).也可以使用 tail 方法 (tail(top)) 来提高可读性.
- Index the DataFrame by year and username.
- Group the DataFrame by year which is the first level (=0) of the index
- Apply two operations on the sub DataFrame obtained by the groupby (one for each year)
  - sort the index by count in ascending order sort_index(by='count')-> the row with more counts will be at the tail of the DataFrame
  - Only keep the last top rows (2 in this case) by using the negative slicing notation ([-top:]). The tail method could also be used (tail(top)) to improve readability.
```
# Test data    
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011,2011,2011, 2012, 2012, 2013, 2013, 2014, 2014],
                  'username': ['b','a','a','c','c','d','e','f','g','i','h','j'],
                  'count': [400, 505, 678, 677, 505, 505, 677, 505, 677, 505, 677, 505]})
df = df.set_index(['year','username'])

top = 2
df = df.groupby(level=0).apply(lambda df: df.sort_index(by='count')[-top:])
df.index = df.index.droplevel(0)
df

               count
year username       
2010 a           505
     a           678
2011 d           505
     c           677
2012 f           505
     e           677
2013 i           505
     g           677
2014 j           505
     h           677
```
  这篇关于显示多索引 pandas 数据框的前 10 行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

显示多索引 pandas 数据框的前 10 行 [英] Show first 10 rows of multi-index pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

显示多索引 pandas 数据框的前 10 行 [英] Show first 10 rows of multi-index pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭