如何计算pandas DataFrame中的nan值? [英] How to count nan values in a pandas DataFrame?

查看:920
本文介绍了如何计算pandas DataFrame中的nan值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在熊猫DataFrame中解释(不是数字)nan值的最佳方法是什么?

What is the best way to account for (not a number) nan values in a pandas DataFrame?

以下代码:

import numpy as np
import pandas as pd
dfd = pd.DataFrame([1, np.nan, 3, 3, 3, np.nan], columns=['a'])
dfv = dfd.a.value_counts().sort_index()
print("nan: %d" % dfv[np.nan].sum())
print("1: %d" % dfv[1].sum())
print("3: %d" % dfv[3].sum())
print("total: %d" % dfv[:].sum())

输出:

nan: 0
1: 1
3: 3
total: 4

所需的输出是:

nan: 2
1: 1
3: 3
total: 6

我在Python 3.5.0和Anaconda 2.4.0上使用了熊猫0.17.

I am using pandas 0.17 with Python 3.5.0 with Anaconda 2.4.0.

推荐答案

如果只想对DataFrame df的列'a'中的NaN值进行计数,请使用:

If you want to count only NaN values in column 'a' of a DataFrame df, use:

len(df) - df['a'].count()

这里count()告诉我们非NaN值的数量,这是从值的总数(由len(df)得出)中减去的.

Here count() tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).

要计算df每个列中的NaN值,请使用:

To count NaN values in every column of df, use:

len(df) - df.count()


如果要使用value_counts,则通过设置dropna=False(添加在


If you want to use value_counts, tell it not to drop NaN values by setting dropna=False (added in 0.14.1):

dfv = dfd['a'].value_counts(dropna=False)

这也可以计算该列中的缺失值:

This allows the missing values in the column to be counted too:

 3     3
NaN    2
 1     1
Name: a, dtype: int64

然后,其余代码应可以按预期工作(请注意,不必调用sum;只需print("nan: %d" % dfv[np.nan])就足够了).

The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan]) suffices).

这篇关于如何计算pandas DataFrame中的nan值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆