在数据框字典上生成平均值 [英] Generating average values on dictionary of dataframes
问题描述
我有以下熊猫数据帧:
phreatic_level_l2n1_28w_df.head()
Fecha Hora PORVL2N1 # PORVLxNx column change their name in each data frame
0 2012-01-12 01:37:47 0.65
1 2012-01-12 02:37:45 0.65
2 2012-01-12 03:37:50 0.64
3 2012-01-12 04:37:44 0.63
4 2012-01-12 05:37:45 0.61
phreatic_level_l2n2_28w_df.head()
Fecha Hora PORVL2N2 # PORVLxNx column change their name in each data frame
0 2018-01-12 01:58:22 0.71
1 2018-01-12 02:58:22 0.71
2 2018-01-12 03:58:23 0.71
3 2018-01-12 04:58:23 0.71
4 2018-01-12 05:58:24 0.71
phreatic_level_l4n1_28w_df.head()
Fecha Hora PORVL4N1 # PORVLxNx column change their name in each data frame
0 2018-01-12 01:28:49 0.96
1 2018-01-12 02:28:49 0.96
2 2018-01-12 03:28:50 0.96
3 2018-01-12 04:28:52 0.95
4 2018-01-12 05:28:48 0.94
直到有25个数据类型为 phreatic_level_l24n2_28w_df
And so, successively until have 25 data frames of type phreatic_level_l24n2_28w_df
.
.
.
phreatic_level_l24n2_28w_df.head()
Fecha Hora PORVL24N2 # PORVLxNx column change their name in each data frame
0 2018-01-12 01:07:28 1.31
1 2018-01-12 02:07:28 1.31
2 2018-01-12 03:07:29 1.31
3 2018-01-12 04:07:27 1.31
4 2018-01-12 05:07:27 1.31
每行包含 PORVLxNx
列的日期范围( Fecha
列)的每日值从 2018-01-开始12
直到 2018-08-03
,每天都有许多 PORVLxNx
列
Each row contains data frames on the PORVLxNx
column has values per day on the date range ( Fecha
column) from 2018-01-12
until 2018-08-03
, having per each day many values of PORVLxNx
column
phreatic_level_l24n2_28w_df.tail()
Fecha Hora PORVL24N2
4875 2018-08-03 20:31:01 1.15
4876 2018-08-03 21:31:00 1.15
4877 2018-08-03 22:31:01 1.16
4878 2018-08-03 23:31:02 1.17
4879 NaN NaN NaN
我的目标是取每个数据帧并每天生成平均 PORVLxNx
,如下所示:
Fecha PORVL2N1
0 2018-01-12 0.519130
1 2018-01-13 0.138750
2 2018-01-14 0.175417
3 2018-01-15 0.111667
4 2018-01-16 0.291250
My objective is to take each dataframe and generate the average PORVLxNx
per day, as follow:
Fecha PORVL2N1
0 2018-01-12 0.519130
1 2018-01-13 0.138750
2 2018-01-14 0.175417
3 2018-01-15 0.111667
4 2018-01-16 0.291250
我有以下方法:
我将 DataFrame
放入字典,我引用了它并使用了一个字符串:
I placed my DataFrame
s in a dict and I referenced it suing a string:
dfs = {
'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
# FOR THE MOMENT I ONLY TEST with the first dataframe
# 'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
# 'phreatic_level_l4n1_28w_df': phreatic_level_l4n1_28w_df,
# 'phreatic_level_l5n1_28w_df': phreatic_level_l5n1_28w_df,
# 'phreatic_level_l6n1_28w_df': phreatic_level_l6n1_28w_df,
# 'phreatic_level_l7n1_28w_df': phreatic_level_l7n1_28w_df,
# 'phreatic_level_l8n1_28w_df': phreatic_level_l8n1_28w_df,
# 'phreatic_level_l9n1_28w_df': phreatic_level_l9n1_28w_df,
# 'phreatic_level_l10n1_28w_df': phreatic_level_l10n1_28w_df,
# 'phreatic_level_l13n1_28w_df': phreatic_level_l13n1_28w_df,
# 'phreatic_level_l14n1_28w_df': phreatic_level_l14n1_28w_df,
# 'phreatic_level_l15n1_28w_df': phreatic_level_l15n1_28w_df,
# 'phreatic_level_l16n1_28w_df': phreatic_level_l16n1_28w_df,
# 'phreatic_level_l16n2_28w_df': phreatic_level_l16n2_28w_df,
# 'phreatic_level_l18n1_28w_df': phreatic_level_l18n1_28w_df,
# 'phreatic_level_l18n2_28w_df': phreatic_level_l18n2_28w_df,
# 'phreatic_level_l18n3_28w_df': phreatic_level_l18n3_28w_df,
# 'phreatic_level_l18n4_28w_df': phreatic_level_l18n4_28w_df,
# 'phreatic_level_l21n1_28w_df': phreatic_level_l21n1_28w_df,
# 'phreatic_level_l21n2_28w_df': phreatic_level_l21n2_28w_df,
# 'phreatic_level_l21n3_28w_df': phreatic_level_l21n3_28w_df,
# 'phreatic_level_l21n4_28w_df': phreatic_level_l21n4_28w_df,
# 'phreatic_level_l21n5_28w_df': phreatic_level_l21n5_28w_df,
# 'phreatic_level_l24n1_28w_df': phreatic_level_l24n1_28w_df,
# 'phreatic_level_l24n2_28w_df': phreatic_level_l24n2_28w_df
}
我正在遍历数据帧(此刻刚过 phreatic_level_l2n1_28w_df
)
I am iterating over the data frames (in this moment just over phreatic_level_l2n1_28w_df
)
for name, df in dfs.items():
# We turn to datetime the Fecha column values
df['Fecha'] = pd.to_datetime(df['Fecha'])
# I am iterating over each *`PORVLxNx`* column
for i in range(1,24):
if(i==2):
# To N1
l2_n1_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i-1)].mean().reset_index())
l2_n1_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i-1), sep=',', header=True, index=False)
print(l2_n1_average_per_day.head())
我的 l2_n1_average_per_day.head()
的输出是:
Fecha PORVL2N1
0 2018-01-12 0.519130
1 2018-01-13 0.138750
2 2018-01-14 0.175417
3 2018-01-15 0.111667
4 2018-01-16 0.291250
l2_n1_average_per_day.tail()
Fecha PORVL2N1
199 2018-07-30 0.630417
200 2018-07-31 0.609583
201 2018-08-01 0.533333
202 2018-08-02 0.470833
203 2018-08-03 0.713333
直到这里,我的想法才有效。
Until here, my idea it's works.
当我想应用此解决方案时(很可能没有到我的 dfs
词典中包含的其他数据帧
When I want to apply this solution (is very possible that there is not the more optimal) to other data frames contained in my dfs
dictionary
dfs = {
'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
# I've added the L2N2 phreatic_level_l2n2_28w_df dataframe item
}
我再次迭代 ...
for name, df in dfs.items():
df['Fecha'] = pd.to_datetime(df['Fecha'])
for i in range(1,24):
if(i==2):
# To N1
l2_n1_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i-1)].mean().reset_index())
l2_n1_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i-1), sep=',', header=True, index=False)
# To N2. I've generate the average per day to L2N2
l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
在我的输出中, PORVL2N2
。
----------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
11 print(phreatic_level_l2_n1_average_per_day.tail())
12 # To N2
---> 13 phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
14 phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
15
~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
265 else:
266 if key not in self.obj:
--> 267 raise KeyError("Column not found: {key}".format(key=key))
268 return self._gotitem(key, ndim=1)
269
KeyError: 'Column not found: PORVL2N2'
这很奇怪,因为在字典中我的数据框中,经过迭代,我有 PORVL2N2
列
This is strange, because in my dataframe inside the dictionary, which is iterated, I have the PORVL2N2
column
phreatic_level_l2n2_28w_df.head()
Fecha Hora PORVL2N2
0 2018-01-12 01:58:22 0.71
1 2018-01-12 02:58:22 0.71
2 2018-01-12 03:58:23 0.71
3 2018-01-12 04:58:23 0.71
4 2018-01-12 05:58:24 0.71
是否有可能在我的迭代中重写数据帧或发生其他事情?
Is possible, that in my iteration, I am overriding the data frames or something else is happening?
推荐答案
您的数据框似乎具有良好且一致的结构,因此您可以做的就是获取想要的列的名称 PORVLxNy
以获得平均值
fr包含 df.columns
和最后一个元素 [-1]
的om。然后将结果保存到名称正确的csv文件中,您只需保留列名的最后4个字符即可:
Your dataframes seem to have a good and consistent structure, so what you can do is to get the name of the column you want PORVLxNy
to get the mean
from with df.columns
and the last element [-1]
. Then to save the result to a csv file with the right name, you can just keep the last 4 characters of the name of the column:
for name, df in dfs.items():
df['Fecha'] = pd.to_datetime(df['Fecha'])
col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
# no need of loop for anymore
lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
.mean().reset_index())
lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]),
sep=',', header=True, index=False)
这篇关于在数据框字典上生成平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!