如何根据最新日期在等级内排列 pandas pivot_table? [英] How to sort pandas pivot_table based on newest date within level?
问题描述
我以我所需的日期顺序创建了一个DataFrame,但是当我把这个数据转换到数据透视表中时,订单就会改变。
表格基于指定级别内任何行的最新日期
data = [['yellow',1,' 02/01/2015'],
['yellow',2,'04 / 01/2015'],
['green',3,'03 / 01/2015'],
['red',4,'01 / 01/2015']]
df = pd.DataFrame(data,columns = ['color','number','date'])
df.pivot_table(index = ['number','date'])
结果是
number
color date
green 03/01/2015 3
red 01 / 01/2015 4
yellow 02/01/2015 1
04/01/2015 2
我希望最终的结果是一个最新的日期列表,最多的是每行最新的日期(o与他们周围的星号)。所以结果是: -
number
pre>
color date
yellow 02/01/2015 2
* 04/01/2015 * 3
green * 03/01/2015 * 4
red * 01/01/2015 * 1
我可以想到三个解决方案,但是我无法使用它们
a)获取pivot_table保持原始订单
b)在pivot_table上使用func沿着latest_date_in_rows
c的行进行排序)为每种颜色创建一个包含最新日期的额外列
不知道哪个是大熊猫世界中正确的路线,但目前我陷入困境:(
解决方案在转动之前,您可以记住旧的
multiindex
,然后 reindex 通过旧的multiindex
输出数据框。将大熊猫导入为pd
data = [['yellow',1,'02 / 01/2015'],
['yellow',2,'04 / 01/2015'],
['green',3,'03 / 01/2015'],
['red',4,'01 / 01/2015']]
df = pd.DataFrame(data,columns = ['color','number','date'])
#simulate datetime列日期
df ['date'] = pd.to_datetime(df ['date'])
#从列颜色和日期设置索引
df = df.set_index(['color','date'])
print df
#number
#colour date
#yellow 2015-02-01 1
#2015-04-01 2
#green 2015-03-01 3
#red 2015-01-01 4
#设置变量idx的旧索引
idx = df.index
打印df.index
#pivot表,它不适用于测试数据
df .pivot_table(index = ['number','date'])
#oldindex
df1 = df.reindex(idx)
打印df1
#number
#colour date
#yellow 2015-02-01 1
#2015-04-01 2
#green 2015-03-01 3
#red 2015-01-01 4
编辑:
我认为问题是原始数据框是' t排序。
它的multiindex
是:MultiIndex(levels = [u'green',u'red',u'yellow'],[u'2015-01-01',u'2015-02-01',u'2015-03-01',u'2015-04 -01']],
labels = [[2,2,0,1],[1,3,2,0]],
names = [u'colour',u'date' ])
输出数据框具有
multiindex
颜色
:MultiIndex(levels = [[u'green'你'u'yellow'],[u'2015-01-01',u'2015-02-01',u'2015-03-01',u'2015-04-01']],
labels = [[0,1,2,2],[2,0,1,3]],
names = [u'colour',u'date'])
您可以按级别排序
date
,但是多指标和输出是:idx1 = df.sortlevel(level ='date')。index
print idx1
MultiIndex(levels = [[u'green',u'red',u'yellow'],[u'2015-01-01',u'2015-02-01',u 2015-03-01',u'2015-04-01']],
labels = [[1,2,0,2],[0,1,2,3]],
name = [u'colour',u'date'])
#reindex by idx1
df1 = df.reindex(idx)
number
颜色日期
红色2015-01-01 4
黄色2015-02-01 1
绿色2015-03-01 3
黄色2015-04-01 2
所以解决方案是
reindex
原始multiindex
。I've created a DataFrame in my desired date order, however, when I put this into a pivot table the order changes.
I wanted to sort the pivot table base on the newest date of any of the rows within a given level
data = [['yellow',1,'02/01/2015'], ['yellow',2,'04/01/2015'], ['green',3,'03/01/2015'], ['red',4,'01/01/2015']] df = pd.DataFrame(data, columns=['colour','number','date']) df.pivot_table(index=['number','date'])
The result is
number colour date green 03/01/2015 3 red 01/01/2015 4 yellow 02/01/2015 1 04/01/2015 2
I want the end result to be a list of colours which have newest dates to be at the top, basically a sort on the newest of the dates per row (the ones with the asterix around them). So the result would be:-
number colour date yellow 02/01/2015 2 *04/01/2015* 3 green *03/01/2015* 4 red *01/01/2015* 1
I can think of three solutions but I can't work them out
a) get pivot_table to keep the original order b) do a sort on the pivot_table using a func along the lines of latest_date_in_rows c) create an extra column containing the latest date against each colour
not sure which is the right route to take in the world of pandas, but at the moment I'm stuck :(
解决方案You can remember old
multiindex
before pivoting and then reindex output dataframe by oldmultiindex
.import pandas as pd data = [['yellow',1,'02/01/2015'], ['yellow',2,'04/01/2015'], ['green',3,'03/01/2015'], ['red',4,'01/01/2015']] df = pd.DataFrame(data, columns=['colour','number','date']) #simulate datetime column date df['date'] = pd.to_datetime(df['date']) #set index from columns colour and date df = df.set_index(['colour', 'date']) print df # number #colour date #yellow 2015-02-01 1 # 2015-04-01 2 #green 2015-03-01 3 #red 2015-01-01 4 #set old index to variable idx idx = df.index print df.index #pivot table, it doesn't work with test data df.pivot_table(index=['number','date']) #reindex by old multiindex df1 = df.reindex(idx) print df1 # number #colour date #yellow 2015-02-01 1 # 2015-04-01 2 #green 2015-03-01 3 #red 2015-01-01 4
EDIT:
I think problem is that original dataframe isn't sorted. Its
multiindex
is:MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']], labels=[[2, 2, 0, 1], [1, 3, 2, 0]], names=[u'colour', u'date'])
Output dataframe has
multiindex
sorted bycolour
:MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']], labels=[[0, 1, 2, 2], [2, 0, 1, 3]], names=[u'colour', u'date'])
And you can sorted by level
date
, but multiindex and output is:idx1 = df.sortlevel(level='date').index print idx1 MultiIndex(levels=[[u'green', u'red', u'yellow'], [u'2015-01-01', u'2015-02-01', u'2015-03-01', u'2015-04-01']], labels=[[1, 2, 0, 2], [0, 1, 2, 3]], names=[u'colour', u'date']) #reindex by idx1 df1 = df.reindex(idx) number colour date red 2015-01-01 4 yellow 2015-02-01 1 green 2015-03-01 3 yellow 2015-04-01 2
So solution is
reindex
by originalmultiindex
.这篇关于如何根据最新日期在等级内排列 pandas pivot_table?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!