如何计算pandas DataFrame中的nan值? [英] How to count nan values in a pandas DataFrame?
问题描述
在熊猫DataFrame中解释(不是数字)nan值的最佳方法是什么?
What is the best way to account for (not a number) nan values in a pandas DataFrame?
以下代码:
import numpy as np
import pandas as pd
dfd = pd.DataFrame([1, np.nan, 3, 3, 3, np.nan], columns=['a'])
dfv = dfd.a.value_counts().sort_index()
print("nan: %d" % dfv[np.nan].sum())
print("1: %d" % dfv[1].sum())
print("3: %d" % dfv[3].sum())
print("total: %d" % dfv[:].sum())
输出:
nan: 0
1: 1
3: 3
total: 4
所需的输出是:
nan: 2
1: 1
3: 3
total: 6
我在Python 3.5.0和Anaconda 2.4.0上使用了熊猫0.17.
I am using pandas 0.17 with Python 3.5.0 with Anaconda 2.4.0.
推荐答案
如果只想对DataFrame df
的列'a'
中的NaN值进行计数,请使用:
If you want to count only NaN values in column 'a'
of a DataFrame df
, use:
len(df) - df['a'].count()
这里count()
告诉我们非NaN值的数量,这是从值的总数(由len(df)
得出)中减去的.
Here count()
tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)
).
要计算df
的每个列中的NaN值,请使用:
To count NaN values in every column of df
, use:
len(df) - df.count()
如果要使用value_counts
,则通过设置dropna=False
(添加在
If you want to use value_counts
, tell it not to drop NaN values by setting dropna=False
(added in 0.14.1):
dfv = dfd['a'].value_counts(dropna=False)
这也可以计算该列中的缺失值:
This allows the missing values in the column to be counted too:
3 3
NaN 2
1 1
Name: a, dtype: int64
然后,其余代码应可以按预期工作(请注意,不必调用sum
;只需print("nan: %d" % dfv[np.nan])
就足够了).
The rest of your code should then work as you expect (note that it's not necessary to call sum
; just print("nan: %d" % dfv[np.nan])
suffices).
这篇关于如何计算pandas DataFrame中的nan值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!