如何检查 Pandas DataFrame 中是否有任何值是 NaN [英] How to check if any value is NaN in a Pandas DataFrame

查看:43
本文介绍了如何检查 Pandas DataFrame 中是否有任何值是 NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python Pandas 中,检查 DataFrame 是否具有一个(或多个)NaN 值的最佳方法是什么?

我知道函数 pd.isnan,但这会为每个元素返回一个布尔值的 DataFrame.

将 numpy 导入为 np将熊猫导入为 pd导入性能图定义设置(n):df = pd.DataFrame(np.random.randn(n))df[df>0.9] = np.nan返回 dfdef isnull_any(df):返回 df.isnull().any()def isnull_values_sum(df):返回 df.isnull().values.sum() >0def isnull_sum(df):返回 df.isnull().sum() >0def isnull_values_any(df):返回 df.isnull().values.any()perfplot.save("out.png",设置=设置,kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],n_range=[2 ** k for k in range(25)],)

df.isnull().sum().sum() 有点慢,但当然还有额外的信息——NaN 的数量.

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values?

I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question either.

解决方案

jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any()

import numpy as np
import pandas as pd
import perfplot


def setup(n):
    df = pd.DataFrame(np.random.randn(n))
    df[df > 0.9] = np.nan
    return df


def isnull_any(df):
    return df.isnull().any()


def isnull_values_sum(df):
    return df.isnull().values.sum() > 0


def isnull_sum(df):
    return df.isnull().sum() > 0


def isnull_values_any(df):
    return df.isnull().values.any()


perfplot.save(
    "out.png",
    setup=setup,
    kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
    n_range=[2 ** k for k in range(25)],
)

df.isnull().sum().sum() is a bit slower, but of course, has additional information -- the number of NaNs.

这篇关于如何检查 Pandas DataFrame 中是否有任何值是 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆