为什么max()有时返回nan并有时忽略它? [英] Why does max() sometimes return nan and sometimes ignores it?
问题描述
这个问题是由一个答案(我刚才给出的)引起的.
This question is motivated by an answer I gave a while ago.
假设我有一个这样的数据框
Let's say I have a dataframe like this
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})
a b c
0 1.0 3.0 NaN
1 2.0 NaN 5.0
2 NaN 10.0 34.0
,我想用最多的行替换NaN
,我可以做
and I want to replace the NaN
by the maximum of the row, I can do
df.apply(lambda row: row.fillna(row.max()), axis=1)
这给了我想要的输出
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 34.0 10.0 34.0
但是,当我使用时,
df.apply(lambda row: row.fillna(max(row)), axis=1)
由于某种原因,只有在以下三种情况中的两种情况下才可以正确替换它:
for some reason it is replaced correctly only in two of three cases:
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 NaN 10.0 34.0
的确,如果我亲自检查
max(df.iloc[0, :])
max(df.iloc[1, :])
max(df.iloc[2, :])
然后打印
3.0
5.0
nan
这样做
df.iloc[0, :].max()
df.iloc[1, :].max()
df.iloc[2, :].max()
它会打印出预期的
3.0
5.0
34.0
我的问题是,为什么max()
在三种情况之一中失败,但并非在所有3种情况中都失败.
My question is why max()
fails in 1 of three cases but not in all 3. Why are the NaN
sometimes ignored and sometimes not?
推荐答案
原因是max
的工作方式是将第一个值用作到目前为止所看到的最大值",然后互相检查该值是否为比目前为止看到的最大值大.但是nan
被定义为与之比较总是返回False ---即nan > 1
为false但1 > nan
也为false.
The reason is that max
works by taking the first value as the "max seen so far", and then checking each other value to see if it is bigger than the max seen so far. But nan
is defined so that comparisons with it always return False --- that is, nan > 1
is false but 1 > nan
is also false.
因此,如果您以nan
作为数组中的第一个值,则每个后续比较都将检查是否为some_other_value > nan
.这将始终是错误的,因此nan
将保留其位置为到目前为止所能看到的最大值".另一方面,如果nan
不是第一个值,则在达到该值时,比较nan > max_so_far
将再次为false.但是在这种情况下,这意味着当前的到目前为止看到的最大值"(不是nan
)将保持到目前为止看到的最大值,因此nan将始终被丢弃.
So if you start with nan
as the first value in the array, every subsequent comparison will be check whether some_other_value > nan
. This will always be false, so nan
will retain its position as "max seen so far". On the other hand, if nan
is not the first value, then when it is reached, the comparison nan > max_so_far
will again be false. But in this case that means the current "max seen so far" (which is not nan
) will remain the max seen so far, so the nan will always be discarded.
这篇关于为什么max()有时返回nan并有时忽略它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!