Python - 按月汇总并计算平均值 [英] Python - Aggregate by month and calculate average
问题描述
我有一个csv,如下所示:
I have a csv which looks like this:
Date,Sentiment
2014-01-03,0.4
2014-01-04,-0.03
2014-01-09,0.0
2014-01-10,0.07
2014-01-12,0.0
2014-02-24,0.0
2014-02-25,0.0
2014-02-25,0.0
2014-02-26,0.0
2014-02-28,0.0
2014-03-01,0.1
2014-03-02,-0.5
2014-03-03,0.0
2014-03-08,-0.06
2014-03-11,-0.13
2014-03-22,0.0
2014-03-23,0.33
2014-03-23,0.3
2014-03-25,-0.14
2014-03-28,-0.25
etc
我的目标是按月份汇总日期,并计算月份的平均值。日期可能不是从1月或1月开始。问题是,我有很多数据,这意味着我有更多的年。为此,我想找到最快的日期(月),从那开始计数月份及其平均值。例如:
And my goal is to aggregate date by months and calculate average of months. Dates might not start with 1. or January. Problem is that I have a lot of data, that means I have more years. For this purpose I would like to find the soonest date (month) and from there start counting months and their averages. For example:
Month count, average
1, 0.4 (<= the earliest month)
2, -0.3
3, 0.0
...
12, 0.1
13, -0.4 (<= new year but counting of month is continuing)
14, 0.3
我使用Pandas打开csv
I'm using Pandas to open csv
data = pd.read_csv("pks.csv", sep=",")
所以在 data ['Date']
我有日期和在 data ['Sentiment']
我有值。
so in data['Date']
I have dates and in data['Sentiment']
I have values. Any idea how to do it?
推荐答案
可能最简单的方法是使用 resample
命令。首先,当您读取数据时,请确保您解析日期并将日期列设置为索引(忽略 StringIO
部分和标题= True ...我是从多行字符串中读取样本数据):
Probably the simplest approach is to use the resample
command. First, when you read in your data make sure you parse the dates and set the date column as your index (ignore the StringIO
part and the header=True ... I am reading in your sample data from a multi-line string):
>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'],
index_col='Date')
>>> df
Sentiment
Date
2014-01-03 0.40
2014-01-04 -0.03
2014-01-09 0.00
2014-01-10 0.07
2014-01-12 0.00
2014-02-24 0.00
2014-02-25 0.00
2014-02-25 0.00
2014-02-26 0.00
2014-02-28 0.00
2014-03-01 0.10
2014-03-02 -0.50
2014-03-03 0.00
2014-03-08 -0.06
2014-03-11 -0.13
2014-03-22 0.00
2014-03-23 0.33
2014-03-23 0.30
2014-03-25 -0.14
2014-03-28 -0.25
>>> df.resample('M',how='mean')
Sentiment
2014-01-31 0.088
2014-02-28 0.000
2014-03-31 -0.035
如果你想要一个月计数器,你可以在 resample
:
And if you want a month counter, you can add it after your resample
:
>>> agg = df.resample('M',how='mean')
>>> agg['cnt'] = range(len(agg))
>>> agg
Sentiment cnt
2014-01-31 0.088 0
2014-02-28 0.000 1
2014-03-31 -0.035 2
您也可以使用 groupby
方法, c $ c> TimeGrouper 函数(按月分组,然后调用 groupby
中提供的平均方便方法)。
You can also do this with the groupby
method and the TimeGrouper
function (group by month and then call the mean convenience method that is available with groupby
).
>>> df.groupby(pd.TimeGrouper(freq='M')).mean()
Sentiment
2014-01-31 0.088
2014-02-28 0.000
2014-03-31 -0.035
这篇关于Python - 按月汇总并计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!