为什么 max() 有时会返回 nan 有时会忽略它? [英] Why does max() sometimes return nan and sometimes ignores it?
问题描述
这个问题是由一个答案我不久前给出的.
This question is motivated by an answer I gave a while ago.
假设我有一个这样的数据框
Let's say I have a dataframe like this
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})
a b c
0 1.0 3.0 NaN
1 2.0 NaN 5.0
2 NaN 10.0 34.0
我想用行的最大值替换 NaN
,我可以做到
and I want to replace the NaN
by the maximum of the row, I can do
df.apply(lambda row: row.fillna(row.max()), axis=1)
这给了我想要的输出
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 34.0 10.0 34.0
然而,当我使用
df.apply(lambda row: row.fillna(max(row)), axis=1)
出于某种原因,它仅在三种情况中的两种情况下被正确替换:
for some reason it is replaced correctly only in two of three cases:
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 NaN 10.0 34.0
确实,如果我手动检查
max(df.iloc[0, :])
max(df.iloc[1, :])
max(df.iloc[2, :])
然后打印
3.0
5.0
nan
什么时候做
df.iloc[0, :].max()
df.iloc[1, :].max()
df.iloc[2, :].max()
它打印预期的
3.0
5.0
34.0
我的问题是为什么 max()
在三种情况中的 1 种情况下都失败了,但在 3 种情况下都没有.为什么 NaN
有时会被忽略,有时却不会?
My question is why max()
fails in 1 of three cases but not in all 3. Why are the NaN
sometimes ignored and sometimes not?
推荐答案
原因是 max
的工作原理是将第一个值作为迄今为止看到的最大值",然后检查彼此的值看看它是否大于目前看到的最大值.但是 nan
被定义为与它的比较总是返回 False --- 也就是说, nan >1
为假但 1 >nan
也是假的.
The reason is that max
works by taking the first value as the "max seen so far", and then checking each other value to see if it is bigger than the max seen so far. But nan
is defined so that comparisons with it always return False --- that is, nan > 1
is false but 1 > nan
is also false.
所以如果你以 nan
作为数组中的第一个值开始,每次后续的比较都会检查是否 some_other_value >南
.这将始终为 false,因此 nan
将保留其迄今为止所见最大"的位置.另一方面,如果 nan
不是第一个值,那么当它到达时,比较 nan >max_so_far
将再次为假.但在这种情况下,这意味着当前迄今为止看到的最大值"(不是 nan
)将保持迄今为止看到的最大值,因此 nan 将始终被丢弃.
So if you start with nan
as the first value in the array, every subsequent comparison will be check whether some_other_value > nan
. This will always be false, so nan
will retain its position as "max seen so far". On the other hand, if nan
is not the first value, then when it is reached, the comparison nan > max_so_far
will again be false. But in this case that means the current "max seen so far" (which is not nan
) will remain the max seen so far, so the nan will always be discarded.
这篇关于为什么 max() 有时会返回 nan 有时会忽略它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!