计算 pandas 列中第n个元素的平均值 [英] Calculate the average of the n-th number of elements in the column in pandas
问题描述
我有以下数据框:
df1
index year week a b c
-10 2017 10 45 26 19
-9 2017 11 37 23 14
-8 2017 12 21 66 19
-7 2017 13 47 36 92
-6 2017 14 82 65 18
-5 2017 15 68 68 19
-4 2017 16 30 95 24
-3 2017 17 21 15 94
-2 2017 18 67 30 16
-1 2017 19 10 13 13
0 2017 20 26 22 18
1 2017 21 NaN NaN NaN
2 2017 22 NaN NaN NaN
3 2017 23 NaN NaN NaN
4 2017 24 NaN NaN NaN
...
53 2018 20 NaN NaN NaN
我需要每个空单元格计算一列中前n个值的平均值,然后将此值写入一个单元格中. n
等于从零到向上的索引数.例如,对于列a
中的第一个空单元格,我必须计算索引0
和-10
之间的平均值.然后对于1
和-9
之间的下一个单元格,依此类推.列a
,b
和c
也是这样.并且计算总是从index = 1
处开始.
I need for each empty cell to calculate the average of the previous nth values in a column and write this value into a cell. n
is equal to the number of indexes from zero and up. For example, for the first empty cell in the column a
I must calculate the average between the indexes 0
and -10
. Then for the next cell between 1
and -9
and so on. And so do for columns a
, b
and c
. And calculations always begin where the index = 1
.
问题是a
,b
,c
之类的列数可以不同.但是我知道这些列将始终在列week
之后.可以将这些计算应用于不确定的列数,但是如果知道这些列将位于列week
之后?
And the problem is that the number of columns such as a
, b
, c
can be different. But I know that these columns will always be after the column week
. Is it possible to apply these calculations to an indefinite number of columns, but if it is known that these columns will be located after the column week
?
我努力寻找任何东西,但找不到任何合适的东西.
I tried hard to find anything, but I could not find anything suitable.
UPD :如果有帮助,从index = 0
到向下的最大行数将为53
.
UPD: If this helps, the maximum number of rows from index = 0
and down will be 53
.
推荐答案
您可以通过与熊猫和numpy一起玩来做类似的事情.假设您知道week
列的索引是什么(即使您不知道,简单的搜索也可以获取索引),例如week
列是第3个,您可以执行类似的操作>
You can do something like this by playing around a bit with pandas and numpy. Assuming that you know what the index of the week
column will be(even if you don't, a simple search will get you the index), like example, the week
column is 3rd you can do something like
import numpy as np
import pandas as pd
#data is your dataframe name
column_list = list(data.columns.values)[3:]
for column_name in column_list :
column = data[column_name].values
#converted pandas series to numpy series
for index in xrange(0,column.shape[0]):
#iterating over entries in the column
if np.isnan(column[index]):
column[index] = np.nanmean(column.take(range(index-10,index+1),mode='wrap'))
这是一个不好的未向量化解决方案,但应该可以正常工作.它将用之前的10个条目替换所有的NaN条目.如果您只想让前10个不带来回绕,则只需将前n个n小于10的n ,例如
new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])
This is a bad unvectorized solution, but should work fine. It will replace all NaN entries with the previous 10 entries wrapped around. If you instead wanted only the previous 10 without a wrap around, you simply take the first n for n lesser than 10, like
new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])
希望这会有所帮助!
这篇关于计算 pandas 列中第n个元素的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!