在数据框字典上生成平均值 [英] Generating average values on dictionary of dataframes

查看:65
本文介绍了在数据框字典上生成平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下熊猫数据帧:

phreatic_level_l2n1_28w_df.head()
       Fecha    Hora    PORVL2N1  # PORVLxNx column change their name in each data frame
0   2012-01-12  01:37:47    0.65
1   2012-01-12  02:37:45    0.65
2   2012-01-12  03:37:50    0.64
3   2012-01-12  04:37:44    0.63
4   2012-01-12  05:37:45    0.61

phreatic_level_l2n2_28w_df.head()
       Fecha    Hora    PORVL2N2 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:58:22    0.71
1   2018-01-12  02:58:22    0.71
2   2018-01-12  03:58:23    0.71
3   2018-01-12  04:58:23    0.71
4   2018-01-12  05:58:24    0.71

phreatic_level_l4n1_28w_df.head()
       Fecha    Hora    PORVL4N1 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:28:49    0.96
1   2018-01-12  02:28:49    0.96
2   2018-01-12  03:28:50    0.96
3   2018-01-12  04:28:52    0.95
4   2018-01-12  05:28:48    0.94

直到有25个数据类型为 phreatic_level_l24n2_28w_df

And so, successively until have 25 data frames of type phreatic_level_l24n2_28w_df

.
.
.
phreatic_level_l24n2_28w_df.head()
       Fecha    Hora    PORVL24N2 # PORVLxNx column change their name in each data frame
0   2018-01-12  01:07:28    1.31
1   2018-01-12  02:07:28    1.31
2   2018-01-12  03:07:29    1.31
3   2018-01-12  04:07:27    1.31
4   2018-01-12  05:07:27    1.31

每行包含 PORVLxNx 列的日期范围( Fecha 列)的每日值从 2018-01-开始12 直到 2018-08-03 ,每天都有许多 PORVLxNx

Each row contains data frames on the PORVLxNx column has values per day on the date range ( Fecha column) from 2018-01-12 until 2018-08-03 , having per each day many values of PORVLxNx column

phreatic_level_l24n2_28w_df.tail()
           Fecha    Hora    PORVL24N2
4875    2018-08-03  20:31:01    1.15
4876    2018-08-03  21:31:00    1.15
4877    2018-08-03  22:31:01    1.16
4878    2018-08-03  23:31:02    1.17
4879    NaN NaN NaN 

我的目标是取每个数据帧并每天生成平均 PORVLxNx ,如下所示:
Fecha PORVL2N1
0 2018-01-12 0.519130
1 2018-01-13 0.138750
2 2018-01-14 0.175417
3 2018-01-15 0.111667
4 2018-01-16 0.291250

My objective is to take each dataframe and generate the average PORVLxNx per day, as follow: Fecha PORVL2N1 0 2018-01-12 0.519130 1 2018-01-13 0.138750 2 2018-01-14 0.175417 3 2018-01-15 0.111667 4 2018-01-16 0.291250

我有以下方法:

我将 DataFrame 放入字典,我引用了它并使用了一个字符串:

I placed my DataFrames in a dict and I referenced it suing a string:

dfs = {
    'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
    # FOR THE MOMENT I ONLY TEST with the first dataframe 

    # 'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
    # 'phreatic_level_l4n1_28w_df': phreatic_level_l4n1_28w_df,
    # 'phreatic_level_l5n1_28w_df': phreatic_level_l5n1_28w_df,
    # 'phreatic_level_l6n1_28w_df': phreatic_level_l6n1_28w_df,
    # 'phreatic_level_l7n1_28w_df': phreatic_level_l7n1_28w_df,
    # 'phreatic_level_l8n1_28w_df': phreatic_level_l8n1_28w_df,
    # 'phreatic_level_l9n1_28w_df': phreatic_level_l9n1_28w_df,
    # 'phreatic_level_l10n1_28w_df': phreatic_level_l10n1_28w_df,
    # 'phreatic_level_l13n1_28w_df': phreatic_level_l13n1_28w_df,
    # 'phreatic_level_l14n1_28w_df': phreatic_level_l14n1_28w_df,
    # 'phreatic_level_l15n1_28w_df': phreatic_level_l15n1_28w_df,
    # 'phreatic_level_l16n1_28w_df': phreatic_level_l16n1_28w_df,
    # 'phreatic_level_l16n2_28w_df': phreatic_level_l16n2_28w_df,
    # 'phreatic_level_l18n1_28w_df': phreatic_level_l18n1_28w_df,
    # 'phreatic_level_l18n2_28w_df': phreatic_level_l18n2_28w_df,
    # 'phreatic_level_l18n3_28w_df': phreatic_level_l18n3_28w_df,
    # 'phreatic_level_l18n4_28w_df': phreatic_level_l18n4_28w_df,
    # 'phreatic_level_l21n1_28w_df': phreatic_level_l21n1_28w_df,
    # 'phreatic_level_l21n2_28w_df': phreatic_level_l21n2_28w_df,
    # 'phreatic_level_l21n3_28w_df': phreatic_level_l21n3_28w_df,
    # 'phreatic_level_l21n4_28w_df': phreatic_level_l21n4_28w_df,
    # 'phreatic_level_l21n5_28w_df': phreatic_level_l21n5_28w_df,
    # 'phreatic_level_l24n1_28w_df': phreatic_level_l24n1_28w_df,
    # 'phreatic_level_l24n2_28w_df': phreatic_level_l24n2_28w_df  

}

我正在遍历数据帧(此刻刚过 phreatic_level_l2n1_28w_df

I am iterating over the data frames (in this moment just over phreatic_level_l2n1_28w_df )

for name, df in dfs.items():
    # We turn to datetime the Fecha column values 
    df['Fecha'] = pd.to_datetime(df['Fecha'])

    # I am iterating over each *`PORVLxNx`* column
    for i in range(1,24):
        if(i==2):
            # To N1
            l2_n1_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i-1)].mean().reset_index())
            l2_n1_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i-1), sep=',', header=True, index=False)
            print(l2_n1_average_per_day.head()) 

我的 l2_n1_average_per_day.head()的输出是:

    Fecha  PORVL2N1
0 2018-01-12  0.519130
1 2018-01-13  0.138750
2 2018-01-14  0.175417
3 2018-01-15  0.111667
4 2018-01-16  0.291250

l2_n1_average_per_day.tail()

        Fecha  PORVL2N1
199 2018-07-30  0.630417
200 2018-07-31  0.609583
201 2018-08-01  0.533333
202 2018-08-02  0.470833
203 2018-08-03  0.713333

直到这里,我的想法才有效。

Until here, my idea it's works.

当我想应用此解决方案时(很可能没有到我的 dfs 词典中包含的其他数据帧

When I want to apply this solution (is very possible that there is not the more optimal) to other data frames contained in my dfs dictionary

dfs = {
        'phreatic_level_l2n1_28w_df': phreatic_level_l2n1_28w_df,
        'phreatic_level_l2n2_28w_df': phreatic_level_l2n2_28w_df,
        # I've added the L2N2  phreatic_level_l2n2_28w_df dataframe item       
    }

我再次迭代 ...

for name, df in dfs.items():
    df['Fecha'] = pd.to_datetime(df['Fecha'])
    for i in range(1,24):
        if(i==2):
            # To N1
            l2_n1_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i-1)].mean().reset_index())
            l2_n1_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i-1), sep=',', header=True, index=False)

            # To N2. I've generate the average per day to L2N2

            l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
            l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)

在我的输出中, PORVL2N2 。

----------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-161-fbe6eaf8a824> in <module>()
     11             print(phreatic_level_l2_n1_average_per_day.tail())
     12             # To N2
---> 13             phreatic_level_l2_n2_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))['PORVL{}N{}'.format(i,i)].mean().reset_index())
     14             phreatic_level_l2_n2_average_per_day.to_csv('L{}N{}_average_per-day.csv'.format(i,i), sep=',', header=True, index=False)
     15 

~/anaconda3/envs/sioma/lib/python3.6/site-packages/pandas/core/base.py in __getitem__(self, key)
    265         else:
    266             if key not in self.obj:
--> 267                 raise KeyError("Column not found: {key}".format(key=key))
    268             return self._gotitem(key, ndim=1)
    269 

KeyError: 'Column not found: PORVL2N2'

这很奇怪,因为在字典中我的数据框中,经过迭代,我有 PORVL2N2

This is strange, because in my dataframe inside the dictionary, which is iterated, I have the PORVL2N2 column

phreatic_level_l2n2_28w_df.head()
       Fecha    Hora    PORVL2N2
0   2018-01-12  01:58:22    0.71
1   2018-01-12  02:58:22    0.71
2   2018-01-12  03:58:23    0.71
3   2018-01-12  04:58:23    0.71
4   2018-01-12  05:58:24    0.71

是否有可能在我的迭代中重写数据帧或发生其他事情?

Is possible, that in my iteration, I am overriding the data frames or something else is happening?

推荐答案

您的数据框似乎具有良好且一致的结构,因此您可以做的就是获取想要的列的名称 PORVLxNy 以获得平均值 fr包含 df.columns 和最后一个元素 [-1] 的om。然后将结果保存到名称正确的csv文件中,您只需保留列名的最后4个字符即​​可:

Your dataframes seem to have a good and consistent structure, so what you can do is to get the name of the column you want PORVLxNy to get the mean from with df.columns and the last element [-1]. Then to save the result to a csv file with the right name, you can just keep the last 4 characters of the name of the column:

for name, df in dfs.items():
    df['Fecha'] = pd.to_datetime(df['Fecha'])
    col = df.columns[-1] #here col = PORVLxNx with the right x depending on df
    # no need of loop for anymore
    lx_ny_average_per_day = (df.groupby(pd.Grouper(key='Fecha', freq='D'))[col]
                               .mean().reset_index())
    lx_ny_average_per_day.to_csv( '{}_average_per-day.csv'.format(col[-4:]), 
                                  sep=',', header=True, index=False)

这篇关于在数据框字典上生成平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆