pandas 按工作日分组(M/T/W/T/F/S/S) [英] Pandas group by weekday (M/T/W/T/F/S/S)
问题描述
我有一个熊猫数据框,其中包含格式为YYYY-MM-DD('arrival_date')的时间序列(作为索引),我想按工作日(周一至周日)中的每个分组对于其他列,则是平均值,中位数,标准差等.最后我应该只有七行,到目前为止,我只发现了如何按周分组,从而每周汇总所有内容.
I have a pandas dataframe containing a time series (as index) of the form YYYY-MM-DD ('arrival_date') and I'd like to group by each of the weekdays (Monday to Sunday) in order to calculate for the other columns the mean, median, std etc. I should have in the end only seven rows and so far I've only found out how to group by week, which aggregates everything weekly.
# Reading the data
df_data = pd.read_csv('data.csv', delimiter=',')
# Providing the correct format for the data
df_data = pd.to_datetime(df_data['arrival_date'], format='%Y%m%d')
# Converting the time series column to index
df_data.index = pd.to_datetime(df_data['arrival_date'], unit='d')
# Grouping by week (= ~52 rows per year)
week_df = df_data.resample('W').mean()
有没有一种简单的方法可以实现我在大熊猫中的目标?我当时想选择每隔7个元素并在结果数组上执行操作,但这似乎不必要地复杂.
Is there a simple way to achieve my goal in pandas? I was thinking to choose every other 7th element and perform operations on the resulting array, but that seems unnecessarily complex.
数据框的头看起来像这样
The head of the data frame looks like this
arrival_date price 1 price_2 price_3 price_4
2 20170816 75.945298 1309.715056 71.510215 22.721958
3 20170817 68.803269 1498.639663 64.675232 22.759137
4 20170818 73.497144 1285.122022 65.620260 24.381532
5 20170819 78.556828 1377.318509 74.028607 26.882429
6 20170820 57.092189 1239.530625 51.942213 22.056378
7 20170821 76.278975 1493.385548 74.801641 27.471604
8 20170822 79.006604 1241.603185 75.360606 28.250994
9 20170823 76.097351 1243.586084 73.459963 24.500618
10 20170824 64.860259 1231.325899 63.205554 25.015120
11 20170825 70.407325 975.091107 64.180692 27.177654
12 20170826 87.742284 1351.306100 79.049023 27.860549
13 20170827 58.014005 1208.424489 51.963388 21.049374
14 20170828 65.774114 1289.341335 59.922912 24.481232
推荐答案
我相信您需要 groupby
通过
I believe you need first parameter parse_dates
in read_csv
for parse column to datetime and then groupby
by weekday_name
and aggregate:
df_data = pd.read_csv('data.csv', parse_dates=['arrival_date'])
week_df = df_data.groupby(df_data['arrival_date'].dt.weekday_name).mean()
print (week_df)
price_1 price_2 price_3 price_4
arrival_date
Friday 71.952235 1130.106565 64.900476 25.779593
Monday 71.026544 1391.363442 67.362277 25.976418
Saturday 83.149556 1364.312304 76.538815 27.371489
Sunday 57.553097 1223.977557 51.952801 21.552876
Thursday 66.831764 1364.982781 63.940393 23.887128
Tuesday 79.006604 1241.603185 75.360606 28.250994
Wednesday 76.021324 1276.650570 72.485089 23.611288
对于数字索引,请使用 weekday
:
For numeric index use weekday
:
week_df = df_data.groupby(df_data['arrival_date'].dt.weekday).mean()
print (week_df)
price_1 price_2 price_3 price_4
arrival_date
0 71.026544 1391.363442 67.362277 25.976418
1 79.006604 1241.603185 75.360606 28.250994
2 76.021324 1276.650570 72.485089 23.611288
3 66.831764 1364.982781 63.940393 23.887128
4 71.952235 1130.106565 64.900476 25.779593
5 83.149556 1364.312304 76.538815 27.371489
6 57.553097 1223.977557 51.952801 21.552876
要正确订购,请添加 reindex
:
For correct ordering add reindex
:
days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday']
week_df = df_data.groupby(df_data['arrival_date'].dt.weekday_name).mean().reindex(days)
print (week_df)
price_1 price_2 price_3 price_4
arrival_date
Monday 71.026544 1391.363442 67.362277 25.976418
Tuesday 79.006604 1241.603185 75.360606 28.250994
Wednesday 76.021324 1276.650570 72.485089 23.611288
Thursday 66.831764 1364.982781 63.940393 23.887128
Friday 71.952235 1130.106565 64.900476 25.779593
Saturday 83.149556 1364.312304 76.538815 27.371489
Sunday 57.553097 1223.977557 51.952801 21.552876
这篇关于 pandas 按工作日分组(M/T/W/T/F/S/S)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!