为什么max()有时返回nan并有时忽略它? [英] Why does max() sometimes return nan and sometimes ignores it?

查看:501
本文介绍了为什么max()有时返回nan并有时忽略它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是由一个答案(我刚才给出的)引起的.

This question is motivated by an answer I gave a while ago.

假设我有一个这样的数据框

Let's say I have a dataframe like this

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})

     a     b     c
0  1.0   3.0   NaN
1  2.0   NaN   5.0
2  NaN  10.0  34.0

,我想用最多的行替换NaN我可以做

and I want to replace the NaN by the maximum of the row, I can do

df.apply(lambda row: row.fillna(row.max()), axis=1)

这给了我想要的输出

      a     b     c
0   1.0   3.0   3.0
1   2.0   5.0   5.0
2  34.0  10.0  34.0

但是,当我使用时,

df.apply(lambda row: row.fillna(max(row)), axis=1)

由于某种原因,只有在以下三种情况中的两种情况下才可以正确替换它:

for some reason it is replaced correctly only in two of three cases:

     a     b     c
0  1.0   3.0   3.0
1  2.0   5.0   5.0
2  NaN  10.0  34.0

的确,如果我亲自检查

max(df.iloc[0, :])
max(df.iloc[1, :])
max(df.iloc[2, :])

然后打印

3.0
5.0
nan

这样做

df.iloc[0, :].max()
df.iloc[1, :].max()
df.iloc[2, :].max()

它会打印出预期的

3.0
5.0
34.0

我的问题是,为什么max()在三种情况之一中失败,但并非在所有3种情况中都失败.

My question is why max() fails in 1 of three cases but not in all 3. Why are the NaN sometimes ignored and sometimes not?

推荐答案

原因是max的工作方式是将第一个值用作到目前为止所看到的最大值",然后互相检查该值是否为比目前为止看到的最大值大.但是nan被定义为与之比较总是返回False ---即nan > 1为false但1 > nan也为false.

The reason is that max works by taking the first value as the "max seen so far", and then checking each other value to see if it is bigger than the max seen so far. But nan is defined so that comparisons with it always return False --- that is, nan > 1 is false but 1 > nan is also false.

因此,如果您以nan作为数组中的第一个值,则每个后续比较都将检查是否为some_other_value > nan.这将始终是错误的,因此nan将保留其位置为到目前为止所能看到的最大值".另一方面,如果nan不是第一个值,则在达到该值时,比较nan > max_so_far将再次为false.但是在这种情况下,这意味着当前的到目前为止看到的最大值"(不是nan)将保持到目前为止看到的最大值,因此nan将始终被丢弃.

So if you start with nan as the first value in the array, every subsequent comparison will be check whether some_other_value > nan. This will always be false, so nan will retain its position as "max seen so far". On the other hand, if nan is not the first value, then when it is reached, the comparison nan > max_so_far will again be false. But in this case that means the current "max seen so far" (which is not nan) will remain the max seen so far, so the nan will always be discarded.

这篇关于为什么max()有时返回nan并有时忽略它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆