使用其他行中的值将函数应用于 pandas 数据框行 [英] Apply function to pandas dataframe row using values in other rows

查看:62
本文介绍了使用其他行中的值将函数应用于 pandas 数据框行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到这样一种情况,我有一个数据帧行来执行计算,并且我需要在后面的行(可能是前面的行)中使用值来进行这些计算(本质上是基于实际数据集的完美预测).我从较早的df.apply调用中获取了每一行,因此可以将整个df传递给下游对象,但是基于分析中对象的复杂性,这似乎不太理想.

I have a situation where I have a dataframe row to perform calculations with, and I need to use values in following (potentially preceding) rows to do these calculations (essentially a perfect forecast based on the real data set). I get each row from an earlier df.apply call, so I could pass the whole df along to the downstream objects, but that seems less than ideal based on the complexity of objects in my analysis.

我找到了一个密切相关的问答[1],但是实际上,这个问题在根本上是不同的,因为我不需要整个df来进行计算,只需下面的x行数(这可能很重要)大型dfs).

I found one closely related question and answer [1], but the problem is actually fundamentally different in the sense that I do not need the whole df for my calcs, simply the following x number of rows (which might matter for large dfs).

例如,

df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], 
                  columns=['PRICE'])
horizon = 3

我需要在逐行df.apply调用中访问以下3(horizon)行中的值.如何在按行应用计算中动态地动态获取接下来3个数据点的幼稚预测?例如对于第一行,其中PRICE100,我需要在计算中使用[200, 300, 400]作为预测.

I need to access values in the following 3 (horizon) rows in my row-wise df.apply call. How can I get a naive forecast of the next 3 data points dynamically in my row-wise apply calcs? e.g. for row the first row, where the PRICE is 100, I need to use [200, 300, 400] as a forecast in my calcs.

[1] 推荐答案

通过使用row.name [1]在df.apply调用内获取行的索引,您可以生成与您所在行有关的预测"数据目前处于开启状态.这实际上是将预测"放到相关行上的预处理步骤,或者如果df在下游可用,则可以将其作为初始df.apply调用的一部分来完成.

By getting the row's index inside of the df.apply call using row.name [1], you can generate the 'forecast' data relative to which row you are currently on. This is effectively a preprocessing step to put the 'forecast' onto the relevant row, or it could be done as part of the initial df.apply call if the df is available downstream.

df = pd.DataFrame([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000], columns=['PRICE'])
horizon = 3

df['FORECAST'] = df.apply(lambda x: [df['PRICE'][x.name+1:x.name+horizon+1]], axis=1)

结果:

   PRICE          FORECAST
0    100   [200, 300, 400]
1    200   [300, 400, 500]
2    300   [400, 500, 600]
3    400   [500, 600, 700]
4    500   [600, 700, 800]
5    600   [700, 800, 900]
6    700  [800, 900, 1000]
7    800       [900, 1000]
8    900            [1000]
9   1000                []

可以在按行计算df.apply的计算中使用.

Which can be used in your row-wise df.apply calcs.

如果要从生成的预测"中删除索引:

If you want to strip the index from the resulting 'Forecast':

df['FORECAST'] = df.apply(lambda x: [df['PRICE'][x.name+1:x.name+horizon+1].reset_index(drop=True)], axis=1)

[1] 获取以下项的索引熊猫应用功能中的一行

这篇关于使用其他行中的值将函数应用于 pandas 数据框行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆