计算 pandas 列中第n个元素的平均值 [英] Calculate the average of the n-th number of elements in the column in pandas

查看:48
本文介绍了计算 pandas 列中第n个元素的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

             df1
index   year   week   a     b     c
 -10    2017    10   45    26    19
  -9    2017    11   37    23    14
  -8    2017    12   21    66    19
  -7    2017    13   47    36    92
  -6    2017    14   82    65    18
  -5    2017    15   68    68    19
  -4    2017    16   30    95    24
  -3    2017    17   21    15    94
  -2    2017    18   67    30    16
  -1    2017    19   10    13    13
   0    2017    20   26    22    18
   1    2017    21   NaN   NaN   NaN
   2    2017    22   NaN   NaN   NaN
   3    2017    23   NaN   NaN   NaN
   4    2017    24   NaN   NaN   NaN
   ...
   53   2018    20   NaN   NaN   NaN

我需要每个空单元格计算一列中前n个值的平均值,然后将此值写入一个单元格中. n等于从零到向上的索引数.例如,对于列a中的第一个空单元格,我必须计算索引0-10之间的平均值.然后对于1-9之间的下一个单元格,依此类推.列abc也是这样.并且计算总是从index = 1处开始.

I need for each empty cell to calculate the average of the previous nth values in a column and write this value into a cell. n is equal to the number of indexes from zero and up. For example, for the first empty cell in the column a I must calculate the average between the indexes 0 and -10. Then for the next cell between 1 and -9 and so on. And so do for columns a, b and c. And calculations always begin where the index = 1.

问题是abc之类的列数可以不同.但是我知道这些列将始终在列week之后.可以将这些计算应用于不确定的列数,但是如果知道这些列将位于列week之后?

And the problem is that the number of columns such as a, b, c can be different. But I know that these columns will always be after the column week. Is it possible to apply these calculations to an indefinite number of columns, but if it is known that these columns will be located after the column week?

我努力寻找任何东西,但找不到任何合适的东西.

I tried hard to find anything, but I could not find anything suitable.

UPD :如果有帮助,从index = 0到向下的最大行数将为53.

UPD: If this helps, the maximum number of rows from index = 0 and down will be 53.

推荐答案

您可以通过与熊猫和numpy一起玩来做类似的事情.假设您知道week列的索引是什么(即使您不知道,简单的搜索也可以获取索引),例如week列是第3个,您可以执行类似的操作

You can do something like this by playing around a bit with pandas and numpy. Assuming that you know what the index of the week column will be(even if you don't, a simple search will get you the index), like example, the week column is 3rd you can do something like

import numpy as np
import pandas as pd
#data is your dataframe name
column_list = list(data.columns.values)[3:]
for column_name in column_list :
    column = data[column_name].values
    #converted pandas series to numpy series
    for index in xrange(0,column.shape[0]):
        #iterating over entries in the column
        if np.isnan(column[index]):
            column[index] = np.nanmean(column.take(range(index-10,index+1),mode='wrap'))

这是一个不好的未向量化解决方案,但应该可以正常工作.它将用之前的10个条目替换所有的NaN条目.如果您只想让前10个不带来回绕,则只需将前n个n小于10的n ,例如
new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])

This is a bad unvectorized solution, but should work fine. It will replace all NaN entries with the previous 10 entries wrapped around. If you instead wanted only the previous 10 without a wrap around, you simply take the first n for n lesser than 10, like
new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])

希望这会有所帮助!

这篇关于计算 pandas 列中第n个元素的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆