为什么 max() 有时会返回 nan 有时会忽略它? [英] Why does max() sometimes return nan and sometimes ignores it?

查看:50
本文介绍了为什么 max() 有时会返回 nan 有时会忽略它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是由一个答案我不久前给出的.

This question is motivated by an answer I gave a while ago.

假设我有一个这样的数据框

Let's say I have a dataframe like this

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})

     a     b     c
0  1.0   3.0   NaN
1  2.0   NaN   5.0
2  NaN  10.0  34.0

我想用行的最大值替换 NaN我可以做到

and I want to replace the NaN by the maximum of the row, I can do

df.apply(lambda row: row.fillna(row.max()), axis=1)

这给了我想要的输出

      a     b     c
0   1.0   3.0   3.0
1   2.0   5.0   5.0
2  34.0  10.0  34.0

然而,当我使用

df.apply(lambda row: row.fillna(max(row)), axis=1)

出于某种原因,它仅在三种情况中的两种情况下被正确替换:

for some reason it is replaced correctly only in two of three cases:

     a     b     c
0  1.0   3.0   3.0
1  2.0   5.0   5.0
2  NaN  10.0  34.0

确实,如果我手动检查

max(df.iloc[0, :])
max(df.iloc[1, :])
max(df.iloc[2, :])

然后打印

3.0
5.0
nan

什么时候做

df.iloc[0, :].max()
df.iloc[1, :].max()
df.iloc[2, :].max()

它打印预期的

3.0
5.0
34.0

我的问题是为什么 max() 在三种情况中的 1 种情况下都失败了,但在 3 种情况下都没有.为什么 NaN 有时会被忽略,有时却不会?

My question is why max() fails in 1 of three cases but not in all 3. Why are the NaN sometimes ignored and sometimes not?

推荐答案

原因是 max 的工作原理是将第一个值作为迄今为止看到的最大值",然后检查彼此的值看看它是否大于目前看到的最大值.但是 nan 被定义为与它的比较总是返回 False --- 也就是说, nan >1 为假但 1 >nan 也是假的.

The reason is that max works by taking the first value as the "max seen so far", and then checking each other value to see if it is bigger than the max seen so far. But nan is defined so that comparisons with it always return False --- that is, nan > 1 is false but 1 > nan is also false.

所以如果你以 nan 作为数组中的第一个值开始,每次后续的比较都会检查是否 some_other_value >南.这将始终为 false,因此 nan 将保留其迄今为止所见最大"的位置.另一方面,如果 nan 不是第一个值,那么当它到达时,比较 nan >max_so_far 将再次为假.但在这种情况下,这意味着当前迄今为止看到的最大值"(不是 nan)将保持迄今为止看到的最大值,因此 nan 将始终被丢弃.

So if you start with nan as the first value in the array, every subsequent comparison will be check whether some_other_value > nan. This will always be false, so nan will retain its position as "max seen so far". On the other hand, if nan is not the first value, then when it is reached, the comparison nan > max_so_far will again be false. But in this case that means the current "max seen so far" (which is not nan) will remain the max seen so far, so the nan will always be discarded.

这篇关于为什么 max() 有时会返回 nan 有时会忽略它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆