创建新列以在 pandas 数据框中的行之间进行比较 [英] create new column that compares across rows in pandas dataframe

查看:66
本文介绍了创建新列以在 pandas 数据框中的行之间进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望根据在接下来的2行中看到的值在数据框中创建一个新列.具体来说,如果接下来2行中的任何值都低于4,那么我希望当前行中的新值等于0(并且如果接下来2行中的所有值都大于4,那么我希望当前行中的新值为1).

I am looking to create a new column in a dataframe based on the values seen in the next 2 rows. Specifically, if any values in the next 2 rows are below 4, then I want the new value in the current row to be 0 (and if all values in the next 2 rows are above 4 then I want the new value in the current row to be 1).

>>> df = pandas.DataFrame({"A": [5,6,7,3,2]})
>>> df
   A
0  5
1  6
2  7
3  8
4  2
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})
>>> desired_result
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

您可以在"desired_result"中看到第一个值为1,因为6和7都大于4(并且适用相同的逻辑),直到第三行中,新值变为0,因为当我们向前看时,接下来的两行(8,2),那么我们看到2是< 4,因此该值变为0.

Where you can see that in the "desired_result" the first value is 1 because 6 and 7 are both > 4 (and hte same logic applies) until in the third row the new value becomes 0 because when we look ahead to the next two rows (8,2) then we see that 2 is < 4 so the value becomes 0.

我一直在尝试使用apply函数,但是我无法弄清楚如何将接下来的2行值作为输入传递.

I have been trying to use the apply function but I cannot figure out how to pass along the next 2 row values as inputs.

我在此站点上找到了很多有关跨列比较的帮助,但无法弄清如何像我描述的那样向前看".

I have found lots of help on this site about comparing across columns, but cannot figure out how to "look ahead" like I described.

感谢您的帮助!

推荐答案

您可以将new值设置为1,然后将locshiftlt(小于)一起使用以设置适当的值值为零.

You can set the new value to one and then use loc together with shift and lt (less than) to set the appropriate values to zero.

df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})
df['new'] = 1

df.loc[(df.A.shift(-1).lt(4)) | (df.A.shift(-2).lt(4)), 'new'] = 0

# The last value does not have any future observations and should be set to zero.
df.new.iat[-1] = 0

>>> df
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

要扩展到接下来的8行,而不是2行:

To expand to the next 8 rows instead of 2:

nrows = 8
df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n) 
                       for n in range(1, nrows + 1))), 'new'] = 0

这篇关于创建新列以在 pandas 数据框中的行之间进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆