在Pandas数据框中查找每三列的平均值 [英] Find Average of Every Three Columns in Pandas dataframe
问题描述
我想查找每三个月的平均值,并将其分配到一个新的季度列(2000q1)。我知道我可以执行以下操作:
df ['2000q1'] = df [['2000-01','2000 -02','2000-03']]。mean(axis = 1)
df ['2000q2'] = df [['2000-04','2000-05','2000-06'] ] .mean(axis = 1)
。
。
。
df ['2016-02'] = df [['2016-04','ㄧ','2016-0ㄧ2016-0']] mean(轴= 1)
但是,这是非常乏味的。
您可以在列上使用groupby:
df.groupby(np.arange(len(df.columns))// 3,axis = 1).mean()
或者,可以将它们转换为datetime。您可以使用resample:
df.columns = pd.to_datetime(df.columns)
df.resample(' Q',轴= 1).mean()
这是一个演示:
cols = pd.date_range('2000-01','2000-06',freq =' MS')
cols = cols.strftime('%Y-%m')
cols
输出:
数组(['2000-01','2000-02' ,'2000-03','2000-04','2000-05','2000-06'],
dtype ='< U7')
df = pd。 DataFrame(np.random.randn(10,6),columns = cols)
df
输出:
2000-01 2000-02 2000-03 2000-04 2000- 05 2000-06
0 -1.263798 0.251526 0.851196 0.159452 1.412013 1.079086
1 -0.909071 0.685913 1.394790 -0.883605 0.034114 -1.073113
2 0.516109 0.452751 -0.397291 -0.050478 -0.364368 -0.002477
3 1.459609 -1.696641 0.457822 1.057702 -0.066313 -0.910785
4 -0.482623 1.388621 0.971078 -0.038535 0.033167 0.025781
5 -0.016654 1.4 04805 0.100335 -0.082941 -0.418608 0.588749
6 0.684735 -2.007105 0.552615 1.969356 -0.614634 0.021459
7 0.382475 0.965739 -1.826609 -0.086537 -0.073538 -0.534753
8 1.548773 -0.157250 0.494819 -1.631516 0.627794 -0.398741
9 0.199049 0.145919 0.711701 0.305382 -0.118315 -2.397075
首选:
df.groupby(np.arange(len(df.columns))// 3,axis = 1).mean()
出:
0 1
0 -0.053692 0.883517
1 0.390544 -0.640868
2 0.190523 -0.139108
3 0.073597 0.026868
4 0.625692 0.006805
5 0.496162 0.029067
6 -0.256585 0.458727
7 -0.159465 -0.231609
8 0.628781 -0.467487
9 0.352223 -0.736669
第二个选择:
df.columns = pd.to_datetime(df .columns)
df.resample('Q',轴= 1).mean()
出:
2000-03-31 2000-06-30
0 -0.053692 0.883517
1 0.390544 -0.640868
2 0.190523 -0.139108
3 0.073597 0.026868
4 0.625692 0.006805
5 0.496162 0.029067
6 -0.256585 0.458727
7 -0.159465 -0.231609
8 0.628781 -0.467487
9 0.352223 -0.736669
你可以将其分配给DataFrame:
res = df.resample('Q',axis = 1).mean()
根据需要更改列名称:
res = res.rename(columns = lambda col:'{} q {}'。format(col.year,col.quarter))
/ pre>
res
出:
2000q1 2000q2
0 -0.053692 0.883517
1 0.390544 -0.640868
2 0.190523 -0.139108
3 0.073597 0.026868
4 0.625692 0.006805
5 0.496162 0.029067
6 -0.256585 0.458727
7 -0.159465 -0.231609
8 0.628781 -0.467487
9 0.352223 -0.736669
并将其附加到您当前的DataFrame:
pd.concat([df,res],axis = 1)
I am new to Python and Pandas. I have a panda dataframe with monthly columns ranging from 2000 (2000-01) to 2016 (2016-06).
I want to find the average of every three months and assign it to a new quarterly column (2000q1). I know I can do the following:
df['2000q1'] = df[['2000-01', '2000-02', '2000-03']].mean(axis=1) df['2000q2'] = df[['2000-04', '2000-05', '2000-06']].mean(axis=1) . . . df['2016-02'] = df[['2016-04', '2016-05', '2016-06']].mean(axis=1)
But, this is very tedious. I appreciate it if someone helps me find a better way.
解决方案You can use groupby on columns:
df.groupby(np.arange(len(df.columns))//3, axis=1).mean()
Or, those can be converted to datetime. You can use resample:
df.columns = pd.to_datetime(df.columns) df.resample('Q', axis=1).mean()
Here's a demo:
cols = pd.date_range('2000-01', '2000-06', freq='MS') cols = cols.strftime('%Y-%m') cols Out: array(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='<U7') df = pd.DataFrame(np.random.randn(10, 6), columns=cols) df Out: 2000-01 2000-02 2000-03 2000-04 2000-05 2000-06 0 -1.263798 0.251526 0.851196 0.159452 1.412013 1.079086 1 -0.909071 0.685913 1.394790 -0.883605 0.034114 -1.073113 2 0.516109 0.452751 -0.397291 -0.050478 -0.364368 -0.002477 3 1.459609 -1.696641 0.457822 1.057702 -0.066313 -0.910785 4 -0.482623 1.388621 0.971078 -0.038535 0.033167 0.025781 5 -0.016654 1.404805 0.100335 -0.082941 -0.418608 0.588749 6 0.684735 -2.007105 0.552615 1.969356 -0.614634 0.021459 7 0.382475 0.965739 -1.826609 -0.086537 -0.073538 -0.534753 8 1.548773 -0.157250 0.494819 -1.631516 0.627794 -0.398741 9 0.199049 0.145919 0.711701 0.305382 -0.118315 -2.397075
First alternative:
df.groupby(np.arange(len(df.columns))//3, axis=1).mean() Out: 0 1 0 -0.053692 0.883517 1 0.390544 -0.640868 2 0.190523 -0.139108 3 0.073597 0.026868 4 0.625692 0.006805 5 0.496162 0.029067 6 -0.256585 0.458727 7 -0.159465 -0.231609 8 0.628781 -0.467487 9 0.352223 -0.736669
Second alternative:
df.columns = pd.to_datetime(df.columns) df.resample('Q', axis=1).mean() Out: 2000-03-31 2000-06-30 0 -0.053692 0.883517 1 0.390544 -0.640868 2 0.190523 -0.139108 3 0.073597 0.026868 4 0.625692 0.006805 5 0.496162 0.029067 6 -0.256585 0.458727 7 -0.159465 -0.231609 8 0.628781 -0.467487 9 0.352223 -0.736669
You can assign this to a DataFrame:
res = df.resample('Q', axis=1).mean()
Change column names as you like:
res = res.rename(columns=lambda col: '{}q{}'.format(col.year, col.quarter)) res Out: 2000q1 2000q2 0 -0.053692 0.883517 1 0.390544 -0.640868 2 0.190523 -0.139108 3 0.073597 0.026868 4 0.625692 0.006805 5 0.496162 0.029067 6 -0.256585 0.458727 7 -0.159465 -0.231609 8 0.628781 -0.467487 9 0.352223 -0.736669
And attach this to your current DataFrame by:
pd.concat([df, res], axis=1)
这篇关于在Pandas数据框中查找每三列的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!