Groupby根据前一行的值 [英] Groupby based on value in previous row
问题描述
100
200
300
500
600
650
1000
I想要做一个 code> 100,200,300 Groupby
(或一个类似的高效构造)来获取每行的值在 100 $ c $内的行在这种情况下,从上面的例子中产生的批次将是
,
500,600,650
1000
这可能吗?在熊猫?由于Pandas试图允许类似于SQL的查询,因此我猜测它应该是这样的。 您可以使用类似的方法到这个问题的答案中描述的内容。这基本上是一个三步过程:
- 使用
shift
来计算inter-行标准,你想区分。
- 使用
cumsum
来总结这个标准以创建一个新的系列,其中包含一个单独的块单个价值。 -
- Use
shift
to compute the inter-row criterion that you want to distinguish. - Use
cumsum
to sum this criterion to create a new Series with separate "blocks" of a single value for each group. - Group on this new Series.
$ b 这是一个例子:
>>> x = pandas.Series([100,200,300,500,600,650,1000,900,750])
>>> ((x-x.shift())。abs()> 100).cumsum())。apply(list)
0 [100,200,300]
1 [ 500,600,650]
2 [1000,900]
3 [750]
dtype:object
请注意,我使用了标准> 100
,这与您提到的< = 100
标准相反。使用这种方法,您需要使用分隔组的标准,而不是加入它们的标准,因此您必须使用否定分组标准。
I have a column with a list of values like so:
100
200
300
500
600
650
1000
I want to do a Groupby
(or a similar efficient construct) to get batches of rows where the value of each row is within 100
of the last row.
In that case the batches produced from the example above would be
100, 200, 300
,
500, 600, 650
1000
Is this possible to do in Pandas? Since Pandas attempts to allow for SQL-like queries, I am guessing that it should be.
You can use an approach similar to that described in the answer to this question. It's basically a three-step process:
Here is an example:
>>> x = pandas.Series([100, 200, 300, 500, 600, 650, 1000, 900, 750])
>>> x.groupby(((x - x.shift()).abs() > 100).cumsum()).apply(list)
0 [100, 200, 300]
1 [500, 600, 650]
2 [1000, 900]
3 [750]
dtype: object
Note that I used the criterion > 100
, which is the opposite of the <= 100
criterion you mentioned. With this approach, you need to use the criterion for separating groups, not the criterion for joining them, so you have to use the negation of your grouping criterion.
这篇关于Groupby根据前一行的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!