as_index = False时,groupby.first,groupby.nth,groupby.head有什么区别 [英] what is different between groupby.first, groupby.nth, groupby.head when as_index=False
问题描述
我在字符串np.nan
中犯的菜鸟错误,由@ coldspeed,@ wen-ben和@ALollz指出.答案非常好,因此我不会删除此问题以保留这些答案.
the rookie mistake I made in string np.nan
having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don't delete this question to keep those answers.
原始:
我已阅读此问题/答案 groupby.first和有什么区别()和groupby.head(1)?
Original:
I have read this question/answer What's the difference between groupby.first() and groupby.head(1)?
该答案说明差异在于处理NaN
值.但是,当我用as_index=False
调用groupby
时,他们都选择了NaN
.
That answer explained that the differences are on handling NaN
value. However, when I call groupby
with as_index=False
, they both pick NaN
fine.
此外,Pandas具有与head
和first
Furthermore, Pandas has groupby.nth
with similar functionality to head
, and first
groupby.first(), groupby.nth(0), groupby.head(1)
与as_index=False
有什么区别?
以下示例:
In [448]: df
Out[448]:
A B
0 1 np.nan
1 1 4
2 1 14
3 2 8
4 2 19
5 2 12
In [449]: df.groupby('A', as_index=False).head(1)
Out[449]:
A B
0 1 np.nan
3 2 8
In [450]: df.groupby('A', as_index=False).first()
Out[450]:
A B
0 1 np.nan
1 2 8
In [451]: df.groupby('A', as_index=False).nth(0)
Out[451]:
A B
0 1 np.nan
3 2 8
我看到`firs()'重置了索引,而其他2则没有.除此之外,有什么区别吗?
I saw that `firs()' resets index while the other 2 doesn't. Besides that, is there any differences?
推荐答案
主要问题是您可能存储了字符串'np.nan'
,而不是真正的null值.以下是这三个对null
值的不同处理方式:
The major issue is that you likely have the string 'np.nan'
stored and not a real null value. Here are how the three handle null
values differently:
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2,3,3], 'B': [None, '1', np.NaN, '2', 3, 4]})
first
这将返回每个组中的第一个非空值.奇怪的是,它不会跳过None
,尽管可以通过kwarg dropna=True
来实现.因此,您可能会返回原来属于不同行的列的值:
first
This will return the first non-null value within each group. Oddly enough it will not skip None
, though this can be made possible with the kwarg dropna=True
. As a result, you may return values for columns that were part of different rows originally:
df.groupby('A', as_index=False).first()
# A B
#0 1 None
#1 2 2
#2 3 3
df.groupby('A', as_index=False).first(dropna=True)
# A B
#0 1 1
#1 2 2
#2 3 3
head(n)
返回组中的前n行. 值保留在行内.如果您给它一个n
大于行数,它将返回该组中的所有行而不会抱怨:
head(n)
Returns the top n rows within a group. Values remain bound within rows. If you give it an n
that is more than the number of rows, it returns all rows in that group without complaining:
df.groupby('A', as_index=False).head(1)
# A B
#0 1 None
#2 2 NaN
#4 3 3
df.groupby('A', as_index=False).head(200)
# A B
#0 1 None
#1 1 1
#2 2 NaN
#3 2 2
#4 3 3
#5 3 4
nth
:
这占用了nth
行,因此值再次保持在行内. .nth(0)
与.head(1)
相同,尽管它们具有不同的用途.例如,如果您需要第0行和第2行,那么使用.head()
很难做到,而使用.nth([0,2])
则很容易.同样,写.head(10)
比.nth(list(range(10))))
更容易.
nth
:
This takes the nth
row, so again values remain bound within the row. .nth(0)
is the same as .head(1)
, though they have different uses. For instance, if you need the 0th and 2nd row, that's difficult to do with .head()
, but easy with .nth([0,2])
. Also it's fair easier to write .head(10)
than .nth(list(range(10))))
.
df.groupby('A', as_index=False).nth(0)
# A B
#0 1 None
#2 2 NaN
#4 3 3
nth
还支持删除具有任何空值的行,因此您可以使用它返回不包含任何空值的第一行,这与.head()
nth
also supports dropping rows with any null-values, so you can use it to return the first row without any null-values, unlike .head()
df.groupby('A', as_index=False).nth(0, dropna='any')
# A B
#A
#1 1 1
#2 2 2
#3 3 3
这篇关于as_index = False时,groupby.first,groupby.nth,groupby.head有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!