按月份名称对 pandas 数据框系列进行排序 [英] Sort a pandas dataframe series by month name

查看:31
本文介绍了按月份名称对 pandas 数据框系列进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Series 对象,它具有:

 日期价格12 月 12 日5月15日4月13日..

问题陈述:我想让它按月出现并计算每个月的平均价格并按月排序.

期望的输出:

 月 mean_price一月XXX二月XXX三月XXX

我想制作一个列表并将其传递给排序函数:

months = [Jan"、Feb"、Mar"、Apr"、May"、Jun"、Jul"、Aug"、Sep"、Oct"、十一月",十二月"]

sort_values 不支持系列.

我遇到的一个大问题是即使

df = df.sort_values(by='date',ascending=True,inplace=True) 有效到最初的 df 但在我做了一个 groupby 之后,它没有保持从排序的 df 出来的顺序.

总而言之,我需要从初始数据框中获得这两列.使用月份 (dt.strftime('%B')) 对 datetime 列进行排序并通过 groupby 排序变得混乱.现在我必须按月份名称对其进行排序.

<小时>

我的代码:

df # 有 5 列,但我需要列日期"和价格"df.sort_values(by='date',inplace=True) #这部分是按日期排序的,很好total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # 虽然现在不是原来的样子,而是月份按字母顺序出现

解决方案

感谢 @Brad Solomon 提供一种更快的字符串大写方法!

注意 1 @Brad Solomon 的回答使用 pd.categorical 应该比我的答案更节省您的资源.他展示了如何为您的分类数据分配顺序.你不应该错过它:P

或者,您可以使用.

df = pd.DataFrame([[dec", 12], [jan", 40], [mar", 11], [aug", 21],[八月", 11], [一月", 11], [一月", 1]],列=[月",价格"])# 预处理:将`jan`、`dec` 大写为`Jan` 和`Dec`df[月"] = df[月"].str.capitalize()# 现在数据集应该看起来像# 月价# -----------# 十二月二十#一月XX# 4 月 XX 日# 将其设为日期时间,以便我们对其进行排序:# 使用 %b 因为数据使用月份的缩写df[月"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.monthdf = df.sort_values(by=月")总计 = (df.groupby(df['Month'])['Price'].mean())# 全部的月1 17.3333333 11.0000008 16.00000012 12.000000

注意事项 2groupby 默认会为你排序组键.请注意在 df = df.sort_values(by=SAME_KEY)total = (df.groupby(df[SAME_KEY])['Price'].mean()). 否则,可能会出现意外行为.请参阅 Groupby 保留组之间的顺序?以哪种方式?了解更多信息.

注意 3一种计算效率更高的方法是先计算均值,然后按月进行排序.这样,您只需要对 12 个项目而不是整个 df 进行排序.如果不需要df进行排序,将减少计算成本.

注意 4 如果人们已经将month 作为索引,并且想知道如何将其分类,请查看 pandas.CategoricalIndex @jezrael 有一个在 Pandas 系列按月索引排序 中制作分类索引的工作示例>

I have a Series object that has:

    date   price
    dec      12
    may      15
    apr      13
    ..

Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.

Desired Output:

 month mean_price
  Jan    XXX
  Feb    XXX
  Mar    XXX

I thought of making a list and passing it in a sort function:

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

but the sort_values doesn't support that for series.

One big problem I have is that even though

df = df.sort_values(by='date',ascending=True,inplace=True) works to the initial df but after I did a groupby, it didn't maintain the order coming out from the sorted df.

To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime('%B')) the sorting got messed up. Now I have to sort it by month name.


My code:

df # has 5 columns though I need the column 'date' and 'price'

df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically

解决方案

Thanks @Brad Solomon for offering a faster way to capitalize string!

Note 1 @Brad Solomon's answer using pd.categorical should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P

Alternatively, you can use.

df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
                  ["aug", 11], ["jan", 11], ["jan", 1]], 
                   columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()

# Now the dataset should look like
#   Month Price
#   -----------
#    Dec    XX
#    Jan    XX
#    Apr    XX

# make it a datetime so that we can sort it: 
# use %b because the data use the abbreviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")

total = (df.groupby(df['Month'])['Price'].mean())

# total 
Month
1     17.333333
3     11.000000
8     16.000000
12    12.000000

Note 2 groupby by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY) and total = (df.groupby(df[SAME_KEY])['Price'].mean()). Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.

Note 3 A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df. It will reduce the computational cost if one don't need df to be sorted.

Note 4 For people already have month as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex @jezrael has a working example on making categorical index ordered in Pandas series sort by month index

这篇关于按月份名称对 pandas 数据框系列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆