递归计算DataFrame值 [英] Calculate DataFrame values recursively

查看：89 发布时间：2020/5/24 3:14:02 python pandas dataframe

本文介绍了递归计算DataFrame值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试递归"计算熊猫数据框的列值.

I'm trying to calculate the column values of a pandas data frame "recursively".

假设有两个不同日期的数据，每个数据有10个观测值，并且您想计算某个变量r，其中仅给出r的第一个值(每天)，并且您想计算其余的2 * 9项，而每次后续值取决于r的上一个条目和一个附加的同期"变量"x".

Suppose there are data for two different days each having 10 observations and you want to calculate some variable r where only the first value of r is given (on each day) and you want to calculate the remaining 2*9 entries while every subsequent value depends on the previous entry of r and one additional 'contemporaneous' variable 'x'.

第一个问题是我想分别进行每一天的计算，即我想对所有计算使用pandas.groupby()函数...但是当我尝试对数据进行子集化并使用函数，我只会收到"NaN"条目

The first problem is that I want to perform the calculations for each day individually i.e. I'd like to use the pandas.groupby() function for all my calculations... but when I try to subset the data and use the shift(1) function, I only get "NaN" entries

data.groupby(data.index)['r'] =   ( (1+data.groupby(data.index)['x']*0.25) * (1+data.groupby(data.index)['r'].shift(1)))

第二种方法是使用for循环遍历索引(日期):

For my second approach, I used a for loop to iterate through the index (dates):

for i in range(2,21):
    data[data['rank'] == i]['r'] =  ( (1+data[data['rank'] == i]['x']*0.25) * (1+data[data['rank'] == i]['r'].shift(1))

但是，那对我来说是行不通的.有没有一种方法可以对DataFrame执行这样的计算?也许像滚动这样的东西适用吗?

but still, that doesn't work for me. Is there a way to perform such a calculation on DataFrames? Maybe something like rolling apply?

数据:

df = pd.DataFrame({
  'rank' : [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10],
  'x' : [0.00275,0.00285,0.0031,0.0036,0.0043,0.0052,0.0063,0.00755,0.00895,0.0105,0.0027,0.00285,0.0031,0.00355,0.00425,0.0051,0.00615,0.00735,0.00875,0.0103],
  'r' : [0.00158,'NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN',0.001485,'NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN']
  },index=['2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
           '2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
           '2014-01-02', '2014-01-02', '2014-01-03', '2014-01-03',
           '2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03',
           '2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03'])

推荐答案

要进行滚动应用，可以使用

To do your rolling apply, you can use pandas.groupby().apply(). Inside the apply you can use a loop to do the calculations per group. The inner loop could also potentially be done with scipy.lfilter, but I couldn't understand the exact formula you are after, so I just winged that part.

代码:

def rolling_apply(group):
    r = [group.r.iloc[0]]
    for x in group.x:
        r.append((1 + r[-1]) * (1 + x * 0.25))
    group.r = r[1:]
    return group

df['R'] = df.groupby(df.index).apply(rolling_apply).r

结果:

                   r  rank        x          R
2014-01-02   0.00158     1  0.00275   1.002269
2014-01-02       NaN     2  0.00285   2.003695
2014-01-02       NaN     3  0.00310   3.006023
2014-01-02       NaN     4  0.00360   4.009628
2014-01-02       NaN     5  0.00430   5.015014
2014-01-02       NaN     6  0.00520   6.022833
2014-01-02       NaN     7  0.00630   7.033894
2014-01-02       NaN     8  0.00755   8.049058
2014-01-02       NaN     9  0.00895   9.069306
2014-01-02       NaN    10  0.01050  10.095737
2014-01-03  0.001485     1  0.00270   1.002161
2014-01-03       NaN     2  0.00285   2.003588
2014-01-03       NaN     3  0.00310   3.005915
2014-01-03       NaN     4  0.00355   4.009471
2014-01-03       NaN     5  0.00425   5.014793
2014-01-03       NaN     6  0.00510   6.022462
2014-01-03       NaN     7  0.00615   7.033259
2014-01-03       NaN     8  0.00735   8.048020
2014-01-03       NaN     9  0.00875   9.067813
2014-01-03       NaN    10  0.01030  10.093737

测试数据:

df = pd.DataFrame({
    'rank': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'x': [0.00275, 0.00285, 0.0031, 0.0036, 0.0043, 0.0052, 0.0063, 0.00755,
          0.00895, 0.0105, 0.0027, 0.00285, 0.0031, 0.00355, 0.00425,
          0.0051, 0.00615, 0.00735, 0.00875, 0.0103],
    'r': [0.00158, 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN',
          'NaN', 0.001485, 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN', 'NaN',
          'NaN', 'NaN']
}, index=['2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
          '2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
          '2014-01-02', '2014-01-02', '2014-01-03', '2014-01-03',
          '2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03',
          '2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03'])

更新:

现在知道所需的实际递归方程，下面是对apply函数的更新:

Now that the actual recursive equation desired is known, here is an update for the apply function:

def rolling_apply(group):
    r = [group.r.iloc[0]]
    for x in group.x[:-1]:
        r.append((1 + r[-1]) * (1 + x * 0.25) - 1)
    group.r = r
    return group

df.r = df.groupby(df.index).apply(rolling_apply).r

这篇关于递归计算DataFrame值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

递归计算DataFrame值 [英] Calculate DataFrame values recursively

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

递归计算DataFrame值 [英] Calculate DataFrame values recursively

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭