按月份名称对 pandas 的数据框系列进行排序? [英] Sort a pandas's dataframe series by month name?

查看:84
本文介绍了按月份名称对 pandas 的数据框系列进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Series对象,具有:

I have a Series object that has:

    date   price
    dec      12
    may      15
    apr      13
    ..

问题陈述::我想按月显示它,并计算每个月的平均价格,并按月按排序方式显示.

Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.

所需的输出:

 month mean_price
  Jan    XXX
  Feb    XXX
  Mar    XXX

我想制作一个列表并将其传递给排序函数:

I thought of making a list and passing it in a sort function:

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

,但是 sort_values 不支持序列化.

我遇到的一个大问题是,即使

One big problem I have is that even though

df = df.sort_values(by='date',ascending=True,inplace=True)有效 到最初的df,但是在我执行了groupby之后,它并没有保持排序后的df的顺序.

df = df.sort_values(by='date',ascending=True,inplace=True) works to the initial df but after I did a groupby, it didn't maintain the order coming out from the sorted df.

最后,我需要从初始数据帧起这两列.对datetime列进行排序,并使用月份(dt.strftime('%B'))通过分组进行排序.现在,我必须按月份名称对其进行排序.

To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime('%B')) the sorting got messed up. Now I have to sort it by month name.

我的代码:

df # has 5 columns though I need the column 'date' and 'price'

df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically

推荐答案

感谢@Brad Solomon提供了一种更快的大写字符串的方法!

注释1 @Brad Solomon的答案使用

Note 1 @Brad Solomon's answer using pd.categorical should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P

或者,您可以使用.

df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
                  ["aug", 11], ["jan", 11], ["jan", 1]], 
                   columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()

# Now the dataset should look like
#   Month Price
#   -----------
#    Dec    XX
#    Jan    XX
#    Apr    XX

# make it a datetime so that we can sort it: 
# use %b because the data use the abbriviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")

total = (df.groupby(df['Month"])['Price'].mean())

# total 
Month
1     17.333333
3     11.000000
8     16.000000
12    12.000000

注释2 默认情况下,groupby将为您对组密钥进行排序.请注意在df = df.sort_values(by=SAME_KEY)total = (df.groupby(df[SAME_KEY])['Price'].mean()).中使用相同的键进行排序和分组,否则,可能会出现意外的行为.请参阅 Groupby保留组之间的顺序? 以哪种方式?

Note 2 groupby by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY) and total = (df.groupby(df[SAME_KEY])['Price'].mean()). Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.

注释3 一种计算效率更高的方法是先计算均值,然后按月进行排序.这样,您只需要排序12个项目,而不是整个df.如果不需要df进行排序,它将降低计算成本.

Note 3 A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df. It will reduce the computational cost if one don't need df to be sorted.

注释4 对于已经拥有 month作为索引的人,并且想知道如何将其分类,请查看熊猫系列按月索引排序

Note 4 For people already have month as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex @jezrael has a working example on making categorical index ordered in Pandas series sort by month index

这篇关于按月份名称对 pandas 的数据框系列进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆