选择列列表中至少一个值不为null的行 [英] Select rows where at least one value from the list of columns is not null

查看:47
本文介绍了选择列列表中至少一个值不为null的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的数据框,其中包含许多列(例如1000).我有一个列列表(由脚本〜10生成).而且我想选择原始数据框中至少其中一个列列表不为空的所有行.

I have a big dataframe with many columns (like 1000). I have a list of columns (generated by a script ~10). And I would like to select all the rows in the original dataframe where at least one of my list of columns is not null.

因此,如果我可以提前知道列数,则可以执行以下操作:

So if I would know the number of my columns in advance, I could do something like this:

list_of_cols = ['col1', ...]
df[
  df[list_of_cols[0]].notnull() |
  df[list_of_cols[1]].notnull() |
  ...
  df[list_of_cols[6]].notnull() |
]

我还可以遍历cols列表并创建一个掩码,然后将其应用于df,但是他的外观过于繁琐.知道熊猫在与nan打交道方面有多么强大,我希望可以找到一种更轻松的方法来实现自己想要的目标.

I can also iterate over the list of cols and create a mask which then I would apply to df, but his looks too tedious. Knowing how powerful is pandas with respect to dealing with nan, I would expect that there is a way easier way to achieve what I want.

推荐答案

dropna()方法中使用thresh参数.通过设置thresh=1,您可以指定如果至少有1个非null的项目,请勿将其删除.

Use the thresh parameter in the dropna() method. By setting thresh=1, you specify that if there is at least 1 non null item, don't drop it.

df = pd.DataFrame(np.random.choice((1., np.nan), (1000, 1000), p=(.3, .7)))
list_of_cols = list(range(10))

df[list_of_cols].dropna(thresh=1).head()

这篇关于选择列列表中至少一个值不为null的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆