使用每一行的非空值创建一个新列 [英] Create a new column using the non-empty value from each row

查看:69
本文介绍了使用每一行的非空值创建一个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个包含4列的Pandas DataFrame:

There is a Pandas DataFrame with 4 columns:

     col1    col2   col3    col4
0  orange     NaN    NaN     NaN
1     NaN  tomato    NaN     NaN
2     NaN     NaN  apple     NaN
3     NaN     NaN    NaN  carrot
4     NaN  potato    NaN     NaN

每一行仅包含一个字符串值,该字符串值可以出现在任何列中.该行中的其他列为NaN.我想创建一列,其中包含字符串值:

Each row contains only one string value, which may be present in any column. Other columns in that row are NaN. I want to create one column, which contains string values:

      col5 
0   orange
1   tomato
2    apple
3   carrot
4   potato

最明显的方法如下:

data['col5'] = data.col1.astype(str) + data.col2.astype(str)...

并从输出字符串中删除"NaN",但这很杂乱,肯定会导致错误.

and remove "NaN" from output strings, but it's messy and will certainly result in errors.

熊猫提供任何简单的方法吗?

Does Pandas offer any simple way of doing this?

推荐答案

这是使用applyfirst_valid_index的一种方法:

Here's one way, with apply and first_valid_index:

In [11]: df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[11]:
0    orange
1    tomato
2     apple
3    carrot
4    potato
dtype: object

要有效地获取这些信息,您可以使用numpy:

To get these efficiently you could drop to numpy:

In [21]: df.values.ravel()[np.arange(0, len(df.index) * len(df.columns), len(df.columns)) + np.argmax(df.notnull().values, axis=1)]
Out[21]: array(['orange', 'tomato', 'apple', 'carrot', 'potato'], dtype=object)

注意:如果您具有所有NaN的行,那么两者都会失败,您应该将其过滤掉(例如,使用dropna).

Note: both will fail if you have rows of all NaN, you should filter these out (e.g. with dropna).

这篇关于使用每一行的非空值创建一个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆