获取每行的第一个非空值 [英] Get first non-null value per row

查看:106
本文介绍了获取每行的第一个非空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例数据框,如下所示. 对于每一行,我想先检查c1,如果它不为null,则检查c2.通过这种方式,找到第一个notnull列并将该值存储到列结果中.

I have a sample dataframe show as below. For each line, I want to check the c1 first, if it is not null, then check c2. By this way, find the first notnull column and store that value to column result.

ID  c1  c2  c3  c4  result
1   a   b           a
2       cc  dd      cc
3           ee  ff  ee
4               gg  gg

我现在正在使用这种方式.但是我想知道是否有更好的方法.(列名没有任何模式,这只是示例)

I am using this way for now. but I would like to know if there is a better method.(The column name do not have any pattern, this is just sample)

df["result"] = np.where(df["c1"].notnull(), df["c1"], None)
df["result"] = np.where(df["result"].notnull(), df["result"], df["c2"])
df["result"] = np.where(df["result"].notnull(), df["result"], df["c3"])
df["result"] = np.where(df["result"].notnull(), df["result"], df["c4"])
df["result"] = np.where(df["result"].notnull(), df["result"], "unknown)

当有很多列时,此方法看起来不好.

When there are lots of columns, this method looks not good.

推荐答案

首先使用回填NaN,然后通过

Use back filling NaNs first and then select first column by iloc:

df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')

或者:

df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')


print (df)
   ID   c1   c2  c3   c4 result
0   1    a    b   a  NaN      a
1   2  NaN   cc  dd   cc     cc
2   3  NaN   ee  ff   ee     ee
3   4  NaN  NaN  gg   gg     gg

性能:

df = pd.concat([df] * 1000, ignore_index=True)


In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.78 ms per loop

In [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.7 ms per loop

#jpp solution
In [222]: %%timeit
     ...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index)
     ...: 
     ...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))]
     ...: 
1 loop, best of 3: 180 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ'  s solution
In [223]: %timeit df['result'] = df.stack().groupby(level=0).first()
1 loop, best of 3: 606 ms per loop

这篇关于获取每行的第一个非空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆