创建新列来比较 Pandas 数据框中的行 [英] create new column that compares across rows in pandas dataframe

查看:39
本文介绍了创建新列来比较 Pandas 数据框中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望根据在接下来的 2 行中看到的值在数据框中创建一个新列.具体来说,如果接下来 2 行中的任何值低于 4,那么我希望当前行中的新值为 0(如果接下来 2 行中的所有值都高于 4,那么我希望当前行中的新值为 1).

<预><代码>>>>df = pandas.DataFrame({"A": [5,6,7,3,2]})>>>df一种0 51 62 73 84 2>>>required_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})>>>期望结果一个新的0 5 11 6 12 7 03 8 04 2 0

在desired_result"中,您可以看到第一个值是 1,因为 6 和 7 都 > 4(并且适用相同的逻辑),直到在第三行中新值变为 0,因为当我们展望接下来的两行 (8,2) 然后我们看到 2 是 <4 所以值变成 0.

我一直在尝试使用 apply 函数,但我不知道如何将接下来的 2 行值作为输入传递.

我在这个网站上找到了很多关于跨列比较的帮助,但不知道如何像我描述的那样向前看".

感谢您的帮助!

解决方案

您可以将 new 值设置为 1,然后将 locshift<一起使用/code> 和 lt(小于)将适当的值设置为零.

df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})df['新'] = 1df.loc[(df.A.shift(-1).lt(4)) |(df.A.shift(-2).lt(4)), 'new'] = 0# 最后一个值没有任何未来的观察,应该设置为零.df.new.iat[-1] = 0>>>df一个新的0 5 11 6 12 7 03 8 04 2 0

要扩展到接下来的 8 行而不是 2 行:

nrows = 8df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n)对于范围内的 n(1, nrows + 1))), 'new'] = 0

I am looking to create a new column in a dataframe based on the values seen in the next 2 rows. Specifically, if any values in the next 2 rows are below 4, then I want the new value in the current row to be 0 (and if all values in the next 2 rows are above 4 then I want the new value in the current row to be 1).

>>> df = pandas.DataFrame({"A": [5,6,7,3,2]})
>>> df
   A
0  5
1  6
2  7
3  8
4  2
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})
>>> desired_result
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

Where you can see that in the "desired_result" the first value is 1 because 6 and 7 are both > 4 (and hte same logic applies) until in the third row the new value becomes 0 because when we look ahead to the next two rows (8,2) then we see that 2 is < 4 so the value becomes 0.

I have been trying to use the apply function but I cannot figure out how to pass along the next 2 row values as inputs.

I have found lots of help on this site about comparing across columns, but cannot figure out how to "look ahead" like I described.

Thanks for the help!

解决方案

You can set the new value to one and then use loc together with shift and lt (less than) to set the appropriate values to zero.

df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})
df['new'] = 1

df.loc[(df.A.shift(-1).lt(4)) | (df.A.shift(-2).lt(4)), 'new'] = 0

# The last value does not have any future observations and should be set to zero.
df.new.iat[-1] = 0

>>> df
   A  new
0  5    1
1  6    1
2  7    0
3  8    0
4  2    0

To expand to the next 8 rows instead of 2:

nrows = 8
df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n) 
                       for n in range(1, nrows + 1))), 'new'] = 0

这篇关于创建新列来比较 Pandas 数据框中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆