as_index = False时,groupby.first,groupby.nth,groupby.head有什么区别 [英] what is different between groupby.first, groupby.nth, groupby.head when as_index=False

查看:184
本文介绍了as_index = False时,groupby.first,groupby.nth,groupby.head有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在字符串np.nan中犯的菜鸟错误,由@ coldspeed,@ wen-ben和@ALollz指出.答案非常好,因此我不会删除此问题以保留这些答案.

the rookie mistake I made in string np.nan having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don't delete this question to keep those answers.

原始:
我已阅读此问题/答案 groupby.first和有什么区别()和groupby.head(1)?

Original:
I have read this question/answer What's the difference between groupby.first() and groupby.head(1)?

该答案说明差异在于处理NaN值.但是,当我用as_index=False调用groupby时,他们都选择了NaN.

That answer explained that the differences are on handling NaN value. However, when I call groupby with as_index=False, they both pick NaN fine.

此外,Pandas具有与headfirst

Furthermore, Pandas has groupby.nth with similar functionality to head, and first

groupby.first(), groupby.nth(0), groupby.head(1)as_index=False有什么区别?

以下示例:

In [448]: df
Out[448]:
   A       B
0  1  np.nan
1  1       4
2  1      14
3  2       8
4  2      19
5  2      12

In [449]: df.groupby('A', as_index=False).head(1)
Out[449]:
   A       B
0  1  np.nan
3  2       8

In [450]: df.groupby('A', as_index=False).first()
Out[450]:
   A       B
0  1  np.nan
1  2       8

In [451]: df.groupby('A', as_index=False).nth(0)
Out[451]:
   A       B
0  1  np.nan
3  2       8

我看到`firs()'重置了索引,而其他2则没有.除此之外,有什么区别吗?

I saw that `firs()' resets index while the other 2 doesn't. Besides that, is there any differences?

推荐答案

主要问题是您可能存储了字符串'np.nan',而不是真正的null值.以下是这三个对null值的不同处理方式:

The major issue is that you likely have the string 'np.nan' stored and not a real null value. Here are how the three handle null values differently:

import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2,3,3], 'B': [None, '1', np.NaN, '2', 3, 4]})


first

这将返回每个组中的第一个非空值.奇怪的是,它不会跳过None,尽管可以通过kwarg dropna=True来实现.因此,您可能会返回原来属于不同行的列的值:


first

This will return the first non-null value within each group. Oddly enough it will not skip None, though this can be made possible with the kwarg dropna=True. As a result, you may return values for columns that were part of different rows originally:

df.groupby('A', as_index=False).first()
#   A     B
#0  1  None
#1  2     2
#2  3     3

df.groupby('A', as_index=False).first(dropna=True)
#   A  B
#0  1  1
#1  2  2
#2  3  3

head(n)

返回组中的前n行. 值保留在行内.如果您给它一个n大于行数,它将返回该组中的所有行而不会抱怨:

head(n)

Returns the top n rows within a group. Values remain bound within rows. If you give it an n that is more than the number of rows, it returns all rows in that group without complaining:

df.groupby('A', as_index=False).head(1)
#   A     B
#0  1  None
#2  2   NaN
#4  3     3

df.groupby('A', as_index=False).head(200)
#   A     B
#0  1  None
#1  1     1
#2  2   NaN
#3  2     2
#4  3     3
#5  3     4

nth:

这占用了nth行,因此值再次保持在行内. .nth(0).head(1)相同,尽管它们具有不同的用途.例如,如果您需要第0行和第2行,那么使用.head()很难做到,而使用.nth([0,2])则很容易.同样,写.head(10).nth(list(range(10))))更容易.

nth:

This takes the nth row, so again values remain bound within the row. .nth(0) is the same as .head(1), though they have different uses. For instance, if you need the 0th and 2nd row, that's difficult to do with .head(), but easy with .nth([0,2]). Also it's fair easier to write .head(10) than .nth(list(range(10)))).

df.groupby('A', as_index=False).nth(0)
#   A     B
#0  1  None
#2  2   NaN
#4  3     3

nth还支持删除具有任何空值的行,因此您可以使用它返回不包含任何空值的第一行,这与.head()

nth also supports dropping rows with any null-values, so you can use it to return the first row without any null-values, unlike .head()

df.groupby('A', as_index=False).nth(0, dropna='any')
#   A  B
#A      
#1  1  1
#2  2  2
#3  3  3

这篇关于as_index = False时,groupby.first,groupby.nth,groupby.head有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆