如何根据最新日期在等级内排列 pandas pivot_table? [英] How to sort pandas pivot_table based on newest date within level?

查看:222
本文介绍了如何根据最新日期在等级内排列 pandas pivot_table?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以我所需的日期顺序创建了一个DataFrame,但是当我把这个数据转换到数据透视表中时,订单就会改变。



表格基于指定级别内任何行的最新日期

  data = [['yellow',1,' 02/01/2015'],
['yellow',2,'04 / 01/2015'],
['green',3,'03 / 01/2015'],
['red',4,'01 / 01/2015']]

df = pd.DataFrame(data,columns = ['color','number','date'])
df.pivot_table(index = ['number','date'])

结果是

  number 
color date
green 03/01/2015 3
red 01 / 01/2015 4
yellow 02/01/2015 1
04/01/2015 2

我希望最终的结果是一个最新的日期列表,最多的是每行最新的日期(o与他们周围的星号)。所以结果是: -

  number 
color date
yellow 02/01/2015 2
* 04/01/2015 * 3
green * 03/01/2015 * 4
red * 01/01/2015 * 1
pre>

我可以想到三个解决方案,但是我无法使用它们



a)获取pivot_table保持原始订单
b)在pivot_table上使用func沿着latest_date_in_rows
c的行进行排序)为每种颜色创建一个包含最新日期的额外列



不知道哪个是大熊猫世界中正确的路线,但目前我陷入困境:(

解决方案

在转动之前,您可以记住旧的 multiindex ,然后 reindex 通过旧的 multiindex 输出数据框。

 将大熊猫导入为pd 

data = [['yellow',1,'02 / 01/2015'],
['yellow',2,'04 / 01/2015'],
['green',3,'03 / 01/2015'],
['red',4,'01 / 01/2015']]
df = pd.DataFrame(data,columns = ['color','number','date'])
#simulate datetime列日期
df ['date'] = pd.to_datetime(df ['date'])
#从列颜色和日期设置索引
df = df.set_index(['color','date'])
print df
#number
#colour date
#yellow 2015-02-01 1
#2015-04-01 2
#green 2015-03-01 3
#red 2015-01-01 4

#设置变量idx的旧索引
idx = df.index
打印df.index

#pivot表,它不适用于测试数据
df .pivot_table(index = ['number','date'])

#oldindex
df1 = df.reindex(idx)
打印df1
#number
#colour date
#yellow 2015-02-01 1
#2015-04-01 2
#green 2015-03-01 3
#red 2015-01-01 4

编辑:



我认为问题是原始数据框是' t排序。
它的 multiindex 是:

  MultiIndex(levels = [u'green',u'red',u'yellow'],[u'2015-01-01',u'2015-02-01',u'2015-03-01',u'2015-04 -01']],
labels = [[2,2,0,1],[1,3,2,0]],
names = [u'colour',u'date' ])

输出数据框具有 multiindex 颜色

  MultiIndex(levels = [[u'green'你'u'yellow'],[u'2015-01-01',u'2015-02-01',u'2015-03-01',u'2015-04-01']], 
labels = [[0,1,2,2],[2,0,1,3]],
names = [u'colour',u'date'])

您可以按级别排序 date ,但是多指标和输出是:

  idx1 = df.sortlevel(level ='date')。index 
print idx1
MultiIndex(levels = [[u'green',u'red',u'yellow'],[u'2015-01-01',u'2015-02-01',u 2015-03-01',u'2015-04-01']],
labels = [[1,2,0,2],[0,1,2,3]],
name = [u'colour',u'date'])


#reindex by idx1
df1 = df.reindex(idx)
number
颜色日期
红色2015-01-01 4
黄色2015-02-01 1
绿色2015-03-01 3
黄色2015-04-01 2

所以解决方案是 reindex 原始 multiindex


I've created a DataFrame in my desired date order, however, when I put this into a pivot table the order changes.

I wanted to sort the pivot table base on the newest date of any of the rows within a given level

data = [['yellow',1,'02/01/2015'],
        ['yellow',2,'04/01/2015'],
        ['green',3,'03/01/2015'],
        ['red',4,'01/01/2015']]

    df = pd.DataFrame(data, columns=['colour','number','date'])
    df.pivot_table(index=['number','date'])

The result is

                    number
colour  date    
green   03/01/2015  3
red     01/01/2015  4
yellow  02/01/2015  1
        04/01/2015  2

I want the end result to be a list of colours which have newest dates to be at the top, basically a sort on the newest of the dates per row (the ones with the asterix around them). So the result would be:-

                    number
colour  date    
yellow  02/01/2015  2
        *04/01/2015*  3
green   *03/01/2015*    4
red     *01/01/2015*    1

I can think of three solutions but I can't work them out

a) get pivot_table to keep the original order b) do a sort on the pivot_table using a func along the lines of latest_date_in_rows c) create an extra column containing the latest date against each colour

not sure which is the right route to take in the world of pandas, but at the moment I'm stuck :(

解决方案

You can remember old multiindex before pivoting and then reindex output dataframe by old multiindex.

import pandas as pd

data = [['yellow',1,'02/01/2015'],
        ['yellow',2,'04/01/2015'],
        ['green',3,'03/01/2015'],
        ['red',4,'01/01/2015']]
df = pd.DataFrame(data, columns=['colour','number','date'])
#simulate datetime column date
df['date'] = pd.to_datetime(df['date'])
#set index from columns colour and date
df = df.set_index(['colour', 'date'])
print df
#                   number
#colour date              
#yellow 2015-02-01       1
#       2015-04-01       2
#green  2015-03-01       3
#red    2015-01-01       4

#set old index to variable idx
idx = df.index
print df.index

#pivot table, it doesn't work with test data
df.pivot_table(index=['number','date'])

#reindex by old multiindex
df1 = df.reindex(idx)
print df1
#                   number
#colour date              
#yellow 2015-02-01       1
#       2015-04-01       2
#green  2015-03-01       3
#red    2015-01-01       4

EDIT:

I think problem is that original dataframe isn't sorted. Its multiindex is:

MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']],
           labels=[[2, 2, 0, 1], [1, 3, 2, 0]],
           names=[u'colour', u'date'])

Output dataframe has multiindex sorted by colour:

MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']],
           labels=[[0, 1, 2, 2], [2, 0, 1, 3]],
           names=[u'colour', u'date'])

And you can sorted by level date, but multiindex and output is:

idx1 = df.sortlevel(level='date').index
print idx1
MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']],
           labels=[[1, 2, 0, 2], [0, 1, 2, 3]],
           names=[u'colour', u'date'])


#reindex by idx1
df1 = df.reindex(idx)
                   number
colour date              
red    2015-01-01       4
yellow 2015-02-01       1
green  2015-03-01       3
yellow 2015-04-01       2

So solution is reindex by original multiindex.

这篇关于如何根据最新日期在等级内排列 pandas pivot_table?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆