pandas groupby方法实际上是如何工作的? [英] How is pandas groupby method actually working?

查看:70
本文介绍了 pandas groupby方法实际上是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我试图理解pandas.dataFrame.groupby()函数,并且在文档中遇到了这个示例:

So I was trying to understand pandas.dataFrame.groupby() function and I came across this example on the documentation:

    In [1]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
   ...:                           'foo', 'bar', 'foo', 'foo'],
   ...:                    'B' : ['one', 'one', 'two', 'three',
   ...:                           'two', 'two', 'one', 'three'],
   ...:                    'C' : np.random.randn(8),
   ...:                    'D' : np.random.randn(8)})
   ...: 

In [2]: df
Out[2]: 
     A      B         C         D
0  foo    one  0.469112 -0.861849
1  bar    one -0.282863 -2.104569
2  foo    two -1.509059 -0.494929
3  bar  three -1.135632  1.071804
4  foo    two  1.212112  0.721555
5  bar    two -0.173215 -0.706771
6  foo    one  0.119209 -1.039575
7  foo  three -1.044236  0.271860

为了进一步探讨,我做到了:

Not to further explore I did this:

print(df.groupby('B').head())

它输出相同的dataFrame,但是当我这样做时:

it outputs the same dataFrame but when I do this:

print(df.groupby('B'))

它给了我这个:

<pandas.core.groupby.DataFrameGroupBy object at 0x7f65a585b390>

这是什么意思?在正常的dataFrame打印中,.head()仅输出前5行,这是怎么回事?

What does this mean? In a normal dataFrame printing .head() simply outputs the first 5 rows what's happening here?

还有为什么打印.head()会提供与数据框相同的输出?

And also why does printing .head() gives the same output as the dataframe? Shouldn't it be grouped by the elements of the column 'B'?

推荐答案

仅使用时

df.groupby('A')

您得到一个 GroupBy对象.此时您尚未对其应用任何功能.在幕后,虽然这个定义可能并不完美,但您可以将groupby对象视为:

You get a GroupBy object. You haven't applied any function to it at that point. Under the hood, while this definition might not be perfect, you can think of a groupby object as:

  • (组,DataFrame)对的迭代器,用于DataFrame或
  • 针对Series的((组,系列))对的迭代器.
  • An iterator of (group, DataFrame) pairs, for DataFrames, or
  • An iterator of (group, Series) pairs, for Series.

说明:

df = DataFrame({'A' : [1, 1, 2, 2], 'B' : [1, 2, 3, 4]})
grouped = df.groupby('A')

# each `i` is a tuple of (group, DataFrame)
# so your output here will be a little messy
for i in grouped:
    print(i)
(1,    A  B
0  1  1
1  1  2)
(2,    A  B
2  2  3
3  2  4)

# this version uses multiple counters
# in a single loop.  each `group` is a group, each
# `df` is its corresponding DataFrame
for group, df in grouped:
    print('group of A:', group, '\n')
    print(df, '\n')
group of A: 1 

   A  B
0  1  1
1  1  2 

group of A: 2 

   A  B
2  2  3
3  2  4 

# and if you just wanted to visualize the groups,
# your second counter is a "throwaway"
for group, _ in grouped:
    print('group of A:', group, '\n')
group of A: 1 

group of A: 2 

现在与.head相同.只需查看 docs 表示该方法:

Now as for .head. Just have a look at the docs for that method:

基本上等同于.apply(lambda x: x.head(n))

因此,这里实际上是对groupby对象的每个组应用一个函数.请记住,每个组(每个DataFrame)都应用了.head(5) ,因此,由于每个组少于或等于5行,因此可以得到原始的DataFrame.

So here you're actually applying a function to each group of the groupby object. Keep in mind .head(5) is applied to each group (each DataFrame), so because you have less than or equal to 5 rows per group, you get your original DataFrame.

请参考上面的示例.如果使用.head(1),则只会得到每个组的前1行:

Consider this with the example above. If you use .head(1), you get only the first 1 row of each group:

print(df.groupby('A').head(1))
   A  B
0  1  1
2  2  3

这篇关于 pandas groupby方法实际上是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆