选择列列表中至少一个值不为null的行 [英] Select rows where at least one value from the list of columns is not null
问题描述
我有一个很大的数据框,其中包含许多列(例如1000).我有一个列列表(由脚本〜10生成).而且我想选择原始数据框中至少其中一个列列表不为空的所有行.
I have a big dataframe with many columns (like 1000). I have a list of columns (generated by a script ~10). And I would like to select all the rows in the original dataframe where at least one of my list of columns is not null.
因此,如果我可以提前知道列数,则可以执行以下操作:
So if I would know the number of my columns in advance, I could do something like this:
list_of_cols = ['col1', ...]
df[
df[list_of_cols[0]].notnull() |
df[list_of_cols[1]].notnull() |
...
df[list_of_cols[6]].notnull() |
]
我还可以遍历cols列表并创建一个掩码,然后将其应用于df
,但是他的外观过于繁琐.知道熊猫在与nan打交道方面有多么强大,我希望可以找到一种更轻松的方法来实现自己想要的目标.
I can also iterate over the list of cols and create a mask which then I would apply to df
, but his looks too tedious. Knowing how powerful is pandas with respect to dealing with nan, I would expect that there is a way easier way to achieve what I want.
推荐答案
在dropna()
方法中使用thresh
参数.通过设置thresh=1
,您可以指定如果至少有1个非null的项目,请勿将其删除.
Use the thresh
parameter in the dropna()
method. By setting thresh=1
, you specify that if there is at least 1 non null item, don't drop it.
df = pd.DataFrame(np.random.choice((1., np.nan), (1000, 1000), p=(.3, .7)))
list_of_cols = list(range(10))
df[list_of_cols].dropna(thresh=1).head()
这篇关于选择列列表中至少一个值不为null的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!