按nan数排列的pandas dataframe删除列 [英] pandas dataframe drop columns by number of nan
问题描述
我有一个带有一些包含nan的列的数据框.我想删除带有特定数量nan的那些列.例如,在下面的代码中,我想删除具有2个或更多nan的任何列.在这种情况下,列"C"将被删除,仅保留"A"和"B".我该如何实施?
I have a dataframe with some columns containing nan. I'd like to drop those columns with certain number of nan. For example, in the following code, I'd like to drop any column with 2 or more nan. In this case, column 'C' will be dropped and only 'A' and 'B' will be kept. How can I implement it?
import pandas as pd
import numpy as np
dff = pd.DataFrame(np.random.randn(10,3), columns=list('ABC'))
dff.iloc[3,0] = np.nan
dff.iloc[6,1] = np.nan
dff.iloc[5:8,2] = np.nan
print dff
推荐答案
dropna
,您只需要传递df的长度-您希望将NaN
值的数量作为阈值:
There is a thresh
param for dropna
, you just need to pass the length of your df - the number of NaN
values you want as your threshold:
In [13]:
dff.dropna(thresh=len(dff) - 2, axis=1)
Out[13]:
A B
0 0.517199 -0.806304
1 -0.643074 0.229602
2 0.656728 0.535155
3 NaN -0.162345
4 -0.309663 -0.783539
5 1.244725 -0.274514
6 -0.254232 NaN
7 -1.242430 0.228660
8 -0.311874 -0.448886
9 -0.984453 -0.755416
因此,以上内容将删除所有不符合df长度(行数)-2作为非Na值数量标准的列.
So the above will drop any column that does not meet the criteria of the length of the df (number of rows) - 2 as the number of non-Na values.
这篇关于按nan数排列的pandas dataframe删除列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!