如何为每一行从下一行中获取一个与Pandas中的条件相匹配的值? [英] How do I get for each row a value from the next row which matches a criteria in Pandas?

查看:239
本文介绍了如何为每一行从下一行中获取一个与Pandas中的条件相匹配的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一个类似于以下表格的表格:

Let's assume we have a table like the one below:

A B
1 1.0
2 2.0
3 2.0
4 3.0
5 2.0
6 1.0
7 1.0

现在,我想为每一行从下一个下一行的A列中获取B≤2.0的值.结果存储在C中.然后我们得到:

Now I want to get for each row the value from column A of the next following row for which B <= 2.0. The result is stored in C. Then we get:

A B   C
1 1.0 2
2 2.0 3 # Here we skip a row because next.B > 2.0
3 2.0 5 
4 3.0 5
5 2.0 6
6 1.0 7
7 1.0 Na

有没有办法在Pandas(或Numpy)中有效地实现这一目标?数据帧可能包含数百万行,我希望此操作最多需要几秒钟.

Is there a way to implement this efficiently in Pandas (or Numpy)? The data frame may contain multiple million rows and I hope that this operation takes at most a few seconds.

如果没有快速的Pandas/Numpy解决方案,我将在Numba中对其进行编码.但是,由于某些原因,我过去对类似问题(nopython& for nest的嵌套)的Numba解决方案非常慢,这就是为什么我要求一种更好的方法.

If there is no fast Pandas/Numpy solution, I will just code it in Numba. However, for some reason, my Numba solutions in the past to similar problems (nopython & nested for & break) were rather slow, which is why I am asking for a better approach.

上下文:

Context: Here I asked how I can get for each row in a time series data frame a value from the next row before a delay expires. This question is related, but does not use time/a sorted column and therefore searchsorted cannot be used.

推荐答案

您可以按照以下几个步骤进行操作:

You can do that in just a few steps as follows:

import pandas as pd
import numpy as np

# initialize column 'C' with the value of column 'A'
# for all rows with values for 'B' smaller than 2.0
# use np.NaN if 'C' if 'B' > 2.0
# because normal int columns do not support null values
# we use the new type Int64 instead 
# (new in pandas version 0.25)
df['C']= df['A'].astype('Int64').where(df['B']<=2.0, np.NaN)

# now just fill the gaps using the value of the next row
# in which the field is filled and shift the column
df['C'].fillna(method='bfill', inplace=True)
df['C']=df['C'].shift(-1)

结果是:

>>> df
   A    B    C
0  1  1.0    2
1  2  2.0    3
2  3  2.0    5
3  4  3.0    5
4  5  2.0    6
5  6  1.0    7
6  7  1.0  NaN

这篇关于如何为每一行从下一行中获取一个与Pandas中的条件相匹配的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆