pandas 重新采样以获得具有时间序列数据的月平均值 [英] pandas resample to get monthly average with time series data

查看:71
本文介绍了pandas 重新采样以获得具有时间序列数据的月平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用来自 tableau 的时间序列数据集 (https://community.tableau.com/thread/194200),包含每日家具销售额,我想重新抽样以获得平均每月销售额.

I'm using the time series dataset from tableau (https://community.tableau.com/thread/194200), containing daily furniture sales, and I want to resample to get average monthly sales.

我尝试在 Pandas 中使用 resample 来获得月均值:

And I tried using resample in Pandas to get monthly mean:

There are four days in January selling furniture, 
and there is no sales in the rest of Jan.

Order Date   Sales
...
2014/1/6     2573.82
2014/1/7     76.728
2014/1/16    127.104
2014/1/20    38.6
...

y_furniture = furniture['Sales'].resample('MS').mean()

我希望结果是每月的实际平均销售额.

I want the result to be the actual average sale per month.

也就是说,将所有日销售额相加并除以 31 天,即 90.85,但代码将总和除以 4,约为 704.这并不能正确反映实际的月销售额.

That is, all daily sales adding up and divided by 31 days, which is 90.85, but the code divided the summation by 4, which is around 704. This doesn't correctly reflect the actual monthly sales.

有人知道如何解决这个问题吗?

Does anyone know how to solve this problem?

推荐答案

我不确定您预期的 ans 是 90.85 还是 704.所以我是为两者提供解决方案,根据您的要求选择它.

I'm not sure whether your expected ans is 90.85 or 704. So I'm providing solution for the both, choose it as per your requirements.

l1 = ['Order Date',
      'Sales',
      ]
l2 = [['2014/1/6',2573.82],
        ['2014/1/7',76.728],
        ['2014/1/16',127.104],
        ['2014/1/20',38.6],
        ['2014/2/20',38.6],
     ]
df = pd.DataFrame(l2, columns=l1)

df['Order Date'] = pd.to_datetime(df['Order Date'])  #make sure Order Date is of Date type



x = df.groupby(df['Order Date'].dt.month).mean()  #or .agg('mean')
#### Output  ####
Order Date         
1           704.063
2            38.600



def doCalculation(df):
    groupSum = df['Sales'].sum()
    return (groupSum / df['Order Date'].dt.daysinmonth)

y = df.groupby(df['Order Date'].dt.month).apply(doCalculation).groupby(['Order Date']).mean()

#### Output ####
Order Date
1    90.846839
2     1.378571

这篇关于pandas 重新采样以获得具有时间序列数据的月平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆