显示多索引 pandas 数据框的前 10 行 [英] Show first 10 rows of multi-index pandas dataframe

查看:31
本文介绍了显示多索引 pandas 数据框的前 10 行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个多级索引熊猫 DataFrame,其中第一级是 year,第二级是 username.我只有一列已经按降序排序.我想显示每个索引级别 0 的前 2 行.

I have a multilevel index pandas DataFrame where the first level is year and the second level is username. I only have one column which is already sorted in a descending manner. I want to show the first 2 rows of each index level 0.

我有什么:

               count
year username                
2010 b         677
     a         505
     c         400
     d         300
 ...
2014 a         100
     b         80

我想要的:

               count
year username                
2010 b         677
     a         505
2011 c         677
     d         505
2012 e         677
     f         505
2013 g         677
     i         505
2014 h         677
     j         505

推荐答案

这里有一个答案.也许有更好的方法来做到这一点(使用索引?),但我认为它有效.原理看似复杂,其实很简单:

Here is an answer. Maybe there is a better way to do that (with indexing ?), but I thing it works. The principle seems complex but is quite simple:

  • 按年份和用户名索引 DataFrame.
  • 按年份对 DataFrame 分组,即索引的第一级 (=0)
  • groupby获取的子DataFrame进行两次操作(每年一次)
    • 按计数升序对索引进行排序 sort_index(by='count')-> 计数较多的行将位于 DataFrame 的尾部立>
    • 使用负切片符号 ([-top:]) 仅保留最后的 top 行(在本例中为 2).也可以使用 tail 方法 (tail(top)) 来提高可读性.
    • Index the DataFrame by year and username.
    • Group the DataFrame by year which is the first level (=0) of the index
    • Apply two operations on the sub DataFrame obtained by the groupby (one for each year)
      • sort the index by count in ascending order sort_index(by='count')-> the row with more counts will be at the tail of the DataFrame
      • Only keep the last top rows (2 in this case) by using the negative slicing notation ([-top:]). The tail method could also be used (tail(top)) to improve readability.
      # Test data    
      df = pd.DataFrame({'year': [2010, 2010, 2010, 2011,2011,2011, 2012, 2012, 2013, 2013, 2014, 2014],
                        'username': ['b','a','a','c','c','d','e','f','g','i','h','j'],
                        'count': [400, 505, 678, 677, 505, 505, 677, 505, 677, 505, 677, 505]})
      df = df.set_index(['year','username'])
      
      top = 2
      df = df.groupby(level=0).apply(lambda df: df.sort_index(by='count')[-top:])
      df.index = df.index.droplevel(0)
      df
      
                     count
      year username       
      2010 a           505
           a           678
      2011 d           505
           c           677
      2012 f           505
           e           677
      2013 i           505
           g           677
      2014 j           505
           h           677
      

      这篇关于显示多索引 pandas 数据框的前 10 行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆