pandas :在每组中选择前几行 [英] Pandas: select the first couple of rows in each group
本文介绍了 pandas :在每组中选择前几行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我无法解决这个简单的问题,因此我在这里寻求帮助... 我有如下所示的DataFrame,我想选择每组"a"中的前两行
I can't solve this simple problem and I'm asking for help here... I have DataFrame as follows and I want to select the first two rows in each group of 'a'
df = pd.DataFrame({'a':pd.Series(['NewYork','NewYork','NewYork','Washington','Washington','Texas','Texas','Texas','Texas']), 'b': np.arange(9)})
df
Out[152]:
a b
0 NewYork 0
1 NewYork 1
2 NewYork 2
3 Washington 3
4 Washington 4
5 Texas 5
6 Texas 6
7 Texas 7
8 Texas 8
也就是说,我想要的输出如下:
that is, I want an output as follows:
a b
0 NewYork 0
1 NewYork 1
2 Washington 3
3 Washington 4
4 Texas 5
5 Texas 6
非常感谢您的帮助.
推荐答案
在熊猫0.13rc中,您可以直接使用head来执行此操作(即无需reset_index):
In pandas 0.13rc, you can do this directly using head (i.e. no need to reset_index):
In [11]: df.groupby('id', as_index=False).head(2)
Out[11]:
id value
0 1 first
1 1 second
3 2 first
4 2 second
5 3 first
6 3 third
9 4 second
10 4 fifth
11 5 first
12 6 first
13 6 second
15 7 fourth
16 7 fifth
[13 rows x 2 columns]
注意:正确的索引,即使有这个小例子,它也比以前快很多(无论是否有reset_index):
Note: the correct indices, and this is significantly faster than before (with or without reset_index) even with this small example:
# 0.13rc
In [21]: %timeit df.groupby('id', as_index=False).head(2)
1000 loops, best of 3: 279 µs per loop
# 0.12
In [21]: %timeit df.groupby('id', as_index=False).head(2) # this didn't work correctly
1000 loops, best of 3: 1.76 ms per loop
In [22]: %timeit df.groupby('id').head(2).reset_index(drop=True)
1000 loops, best of 3: 1.82 ms per loop
这篇关于 pandas :在每组中选择前几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文