创建新列来比较 Pandas 数据框中的行 [英] create new column that compares across rows in pandas dataframe
问题描述
我希望根据在接下来的 2 行中看到的值在数据框中创建一个新列.具体来说,如果接下来 2 行中的任何值低于 4,那么我希望当前行中的新值为 0(如果接下来 2 行中的所有值都高于 4,那么我希望当前行中的新值为 1).
<预><代码>>>>df = pandas.DataFrame({"A": [5,6,7,3,2]})>>>df一种0 51 62 73 84 2>>>required_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})>>>期望结果一个新的0 5 11 6 12 7 03 8 04 2 0在desired_result"中,您可以看到第一个值是 1,因为 6 和 7 都 > 4(并且适用相同的逻辑),直到在第三行中新值变为 0,因为当我们展望接下来的两行 (8,2) 然后我们看到 2 是 <4 所以值变成 0.
我一直在尝试使用 apply 函数,但我不知道如何将接下来的 2 行值作为输入传递.
我在这个网站上找到了很多关于跨列比较的帮助,但不知道如何像我描述的那样向前看".
感谢您的帮助!
您可以将 new
值设置为 1,然后将 loc
与 shift<一起使用/code> 和
lt
(小于)将适当的值设置为零.
df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})df['新'] = 1df.loc[(df.A.shift(-1).lt(4)) |(df.A.shift(-2).lt(4)), 'new'] = 0# 最后一个值没有任何未来的观察,应该设置为零.df.new.iat[-1] = 0>>>df一个新的0 5 11 6 12 7 03 8 04 2 0
要扩展到接下来的 8 行而不是 2 行:
nrows = 8df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n)对于范围内的 n(1, nrows + 1))), 'new'] = 0
I am looking to create a new column in a dataframe based on the values seen in the next 2 rows. Specifically, if any values in the next 2 rows are below 4, then I want the new value in the current row to be 0 (and if all values in the next 2 rows are above 4 then I want the new value in the current row to be 1).
>>> df = pandas.DataFrame({"A": [5,6,7,3,2]})
>>> df
A
0 5
1 6
2 7
3 8
4 2
>>> desired_result = pandas.DataFrame({"A": [5,6,7,8,2], "new": [1,1,0,0,0]})
>>> desired_result
A new
0 5 1
1 6 1
2 7 0
3 8 0
4 2 0
Where you can see that in the "desired_result" the first value is 1 because 6 and 7 are both > 4 (and hte same logic applies) until in the third row the new value becomes 0 because when we look ahead to the next two rows (8,2) then we see that 2 is < 4 so the value becomes 0.
I have been trying to use the apply function but I cannot figure out how to pass along the next 2 row values as inputs.
I have found lots of help on this site about comparing across columns, but cannot figure out how to "look ahead" like I described.
Thanks for the help!
You can set the new
value to one and then use loc
together with shift
and lt
(less than) to set the appropriate values to zero.
df = pd.DataFrame({"A": [5, 6, 7, 8, 2]})
df['new'] = 1
df.loc[(df.A.shift(-1).lt(4)) | (df.A.shift(-2).lt(4)), 'new'] = 0
# The last value does not have any future observations and should be set to zero.
df.new.iat[-1] = 0
>>> df
A new
0 5 1
1 6 1
2 7 0
3 8 0
4 2 0
To expand to the next 8 rows instead of 2:
nrows = 8
df.loc[eval(" | ".join("df.A.shift(-{0}).lt(4)".format(n)
for n in range(1, nrows + 1))), 'new'] = 0
这篇关于创建新列来比较 Pandas 数据框中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!