测试DataFrame中的后续值 [英] Testing subsequent values in a DataFrame

查看:68
本文介绍了测试DataFrame中的后续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有正负整数的一列的DataFrame.对于每一行,我想查看有多少连续的行(从当前行开始并包括当前行)具有负值.

I have a DataFrame with one column with positive and negative integers. For each row, I'd like to see how many consecutive rows (starting with and including the current row) have negative values.

因此,如果序列为2, -1, -3, 1, -1,则结果将为0, 2, 1, 0, 1.

So if a sequence was 2, -1, -3, 1, -1, the result would be 0, 2, 1, 0, 1.

我可以通过遍历所有索引来做到这一点,使用.iloc拆分列,然后使用next()找出下一个正值在哪里.但是我觉得这并没有利用熊猫的功能,我想有一种更好的方法可以做到这一点.我已经尝试过使用.shift()expanding_window,但是没有成功.

I can do this by iterating over all the indices, using .iloc to split the column, and next() to find out where the next positive value is. But I feel like this isn't taking advantage of panda's capabilities, and I imagine that there's a better way of doing it. I've experimented with using .shift() and expanding_window but without success.

是否存在一种更泛泛的"方式来找出当前行满足某种逻辑条件后的连续行数?

这是现在正在工作的东西:

Here's what's working now:

import pandas as pd

df = pd.DataFrame({"a": [2, -1, -3, -1, 1, 1, -1, 1, -1]})

df["b"] = 0
for i in df.index:
    sub = df.iloc[i:].a.tolist()
    df.b.iloc[i] = next((sub.index(n) for n in sub if n >= 0), 1)

我意识到,当结尾处有多个负值时,即使是我自己的示例也无法正常工作.因此,有必要提供更好的解决方案.

我用整数表示问题,但最初在示例中只放入了1-1.我通常需要求解正整数和负整数.

Edit 2: I stated the problem in terms of integers, but originally only put 1 and -1 in my example. I need to solve for positive and negative integers in general.

推荐答案

FWIW,这是一个相当泛滥的答案,不需要任何功能或不适用.从此处(我敢肯定还有其他答案),并感谢@DSM提到了ascending = False选项:

FWIW, here's a fairly pandastic answer that requires no functions or applies. Borrows from here (among other answers I'm sure) and thanks to @DSM for mentioning the ascending=False option:

df = pd.DataFrame({"a": [2, -1, -3, -1, 1, 1, -1, 1, -1, -2]})

df['pos'] = df.a > 0
df['grp'] = ( df['pos'] != df['pos'].shift()).cumsum()
dfg = df.groupby('grp')
df['c'] = np.where( df['a'] < 0, dfg.cumcount(ascending=False)+1, 0 )

   a  b    pos  grp  c
0  2  0   True    1  0
1 -1  3  False    2  3
2 -3  2  False    2  2
3 -1  1  False    2  1
4  1  0   True    3  0
5  1  0   True    3  0
6 -1  1  False    4  1
7  1  0   True    5  0
8 -1  1  False    6  2
9 -2  1  False    6  1

我认为这种方法的好处是,一旦设置了'grp'变量,您就可以使用标准的groupby方法非常轻松地完成很多事情.

I think a nice thing about this method is that once you set up the 'grp' variable you can do lots of things very easily with standard groupby methods.

这篇关于测试DataFrame中的后续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆