使用python和pandas按季度分组数据 [英] Group data by seasons using python and pandas
问题描述
我想使用Pandas和Python迭代我的.csv文件,并通过季节计算一年中每个季节的平均值来分组数据。我想要的季节与月份相关 - 11:冬天,12:冬天,1:冬天,2:春天,3:春天',4:'春天',5:'夏天',6:'夏天',7:'夏天',\
8:'秋',9:'秋'
I want to use Pandas and Python to iterate through my .csv file and group the data by seasons calculating the mean for each season in the year. Currently the quarterly script does Jan-Mar, Apr-Jun etc. I want the seasons correlate to months by - 11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Spring', 3: 'Spring', 4: 'Spring', 5: 'Summer', 6: 'Summer', 7: 'Summer', \ 8: 'Autumn', 9: 'Autumn', 10: 'Autumn'
我有以下数据:
Date,HAD
01/01/1951,1
02/01/1951,-0.13161201
03/01/1951,-0.271796132
04/01/1951,-0.258977158
05/01/1951,-0.198823057
06/01/1951,0.167794502
07/01/1951,0.046093808
08/01/1951,-0.122396694
09/01/1951,-0.121824587
10/01/1951,-0.013002463
:
# Iterate through a list of files in a folder looking for .csv files
for csvfilename in glob.glob("C:/Users/n-jones/testdir/output/*.csv"):
# Allocate a new file name for each file and create a new .csv file
csvfilenameonly = "RBI-Seasons-Year" + path_leaf(csvfilename)
with open("C:/Users/n-jones/testdir/season/" + csvfilenameonly, "wb") as outfile:
# Open the input csv file and allow the script to read it
with open(csvfilename, "rb") as infile:
# Create a pandas dataframe to summarise the data
df = pd.read_csv(infile, parse_dates=[0], index_col=[0], dayfirst=True)
mean = df.resample('Q-SEP', how='mean')
# Output to new csv file
mean.to_csv(outfile)
我希望这有意义。
提前感谢!
推荐答案
看起来你只需要一个dict查找和groupby。以下代码应该可以正常工作。
It looks like you just need a dict lookup and a groupby. The code below should work.
import pandas as pd
import os
import re
lookup = {
11: 'Winter',
12: 'Winter',
1: 'Winter',
2: 'Spring',
3: 'Spring',
4: 'Spring',
5: 'Summer',
6: 'Summer',
7: 'Summer',
8: 'Autumn',
9: 'Autumn',
10: 'Autumn'
}
os.chdir('C:/Users/n-jones/testdir/output/')
for fname in os.listdir('.'):
if re.match(".*csv$", fname):
data = pd.read_csv(fname, parse_dates=[0], dayfirst=True)
data['Season'] = data['Date'].apply(lambda x: lookup[x.month])
data['count'] = 1
data = data.groupby(['Season'])['HAD', 'count'].sum()
data['mean'] = data['HAD'] / data['count']
data.to_csv('C:/Users/n-jones/testdir/season/' + fname)
这篇关于使用python和pandas按季度分组数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!