如何在"pandas"中按列获取丢失/NaN数据的摘要计数? [英] How do I get a summary count of missing/NaN data by column in 'pandas'?

查看:65
本文介绍了如何在"pandas"中按列获取丢失/NaN数据的摘要计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R 中,我可以使用summary命令快速查看丢失数据的数量,但是等效的pandas DataFrame方法describe不会报告这些值.

In R I can quickly see a count of missing data using the summary command, but the equivalent pandas DataFrame method, describe does not report these values.

我知道我可以做类似的事情

I gather I can do something like

len(mydata.index) - mydata.count()

计算每一列的缺失值数量,但是我想知道是否有更好的习惯用法(或者我的方法是否正确).

to compute the number of missing values for each column, but I wonder if there's a better idiom (or if my approach is even right).

推荐答案

describeinfo均报告非缺失值的计数.

Both describe and info report the count of non-missing values.

In [1]: df = DataFrame(np.random.randn(10,2))

In [2]: df.iloc[3:6,0] = np.nan

In [3]: df
Out[3]: 
          0         1
0 -0.560342  1.862640
1 -1.237742  0.596384
2  0.603539 -1.561594
3       NaN  3.018954
4       NaN -0.046759
5       NaN  0.480158
6  0.113200 -0.911159
7  0.990895  0.612990
8  0.668534 -0.701769
9 -0.607247 -0.489427

[10 rows x 2 columns]

In [4]: df.describe()
Out[4]: 
              0          1
count  7.000000  10.000000
mean  -0.004166   0.286042
std    0.818586   1.363422
min   -1.237742  -1.561594
25%   -0.583795  -0.648684
50%    0.113200   0.216699
75%    0.636036   0.608839
max    0.990895   3.018954

[8 rows x 2 columns]


In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 2 columns):
0    7 non-null float64
1    10 non-null float64
dtypes: float64(2)

要弄清失踪人数,您的索尔是正确的

To get a count of missing, your soln is correct

In [20]: len(df.index)-df.count()
Out[20]: 
0    3
1    0
dtype: int64

您也可以这样做

In [23]: df.isnull().sum()
Out[23]: 
0    3
1    0
dtype: int64

这篇关于如何在"pandas"中按列获取丢失/NaN数据的摘要计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆