pandas - 找到第一次出现 [英] pandas - find first occurrence
问题描述
假设我有一个如下的结构化数据框:
Suppose I have a structured dataframe as follows:
df = pd.DataFrame({"A":['a','a','a','b','b'],
"B":[1]*5})
A
列之前已排序.我希望找到 df[df.A!='a']
的第一行索引.最终目标是使用此索引将数据帧基于 A
分成组.
The A
column has previously been sorted. I wish to find the first row index of where df[df.A!='a']
. The end goal is to use this index to break the data frame into groups based on A
.
现在我意识到有一个 groupby 功能.但是,数据框非常大,这是一个简化的玩具示例.由于 A
已经排序,如果我能找到 df.A!='a'
的第一个索引,会更快.因此,重要的是无论您使用什么方法一旦找到第一个元素,扫描就会停止.
Now I realise that there is a groupby functionality. However, the dataframe is quite large and this is a simplified toy example. Since A
has been sorted already, it would be faster if I can just find the 1st index of where df.A!='a'
. Therefore it is important that whatever method that you use the scanning stops once the first element is found.
推荐答案
idxmax
和 argmax
将返回最大值的位置,如果最大值出现多次,则返回第一个位置.
idxmax
and argmax
will return the position of the maximal value or the first position if the maximal value occurs more than once.
在 df.A.ne('a')
df.A.ne('a').idxmax()
3
或 numpy
等价物
(df.A.values != 'a').argmax()
3
<小时>
但是,如果 A
已经排序,那么我们可以使用 searchsorted
However, if A
has already been sorted, then we can use searchsorted
df.A.searchsorted('a', side='right')
array([3])
或 numpy
等价物
df.A.values.searchsorted('a', side='right')
3
这篇关于 pandas - 找到第一次出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!