使用每一行的非空值创建一个新列 [英] Create a new column using the non-empty value from each row
问题描述
有一个包含4列的Pandas DataFrame:
There is a Pandas DataFrame with 4 columns:
col1 col2 col3 col4
0 orange NaN NaN NaN
1 NaN tomato NaN NaN
2 NaN NaN apple NaN
3 NaN NaN NaN carrot
4 NaN potato NaN NaN
每一行仅包含一个字符串值,该字符串值可以出现在任何列中.该行中的其他列为NaN.我想创建一列,其中包含字符串值:
Each row contains only one string value, which may be present in any column. Other columns in that row are NaN. I want to create one column, which contains string values:
col5
0 orange
1 tomato
2 apple
3 carrot
4 potato
最明显的方法如下:
data['col5'] = data.col1.astype(str) + data.col2.astype(str)...
并从输出字符串中删除"NaN",但这很杂乱,肯定会导致错误.
and remove "NaN" from output strings, but it's messy and will certainly result in errors.
熊猫提供任何简单的方法吗?
Does Pandas offer any simple way of doing this?
推荐答案
这是使用apply
和first_valid_index
的一种方法:
Here's one way, with apply
and first_valid_index
:
In [11]: df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[11]:
0 orange
1 tomato
2 apple
3 carrot
4 potato
dtype: object
要有效地获取这些信息,您可以使用numpy:
To get these efficiently you could drop to numpy:
In [21]: df.values.ravel()[np.arange(0, len(df.index) * len(df.columns), len(df.columns)) + np.argmax(df.notnull().values, axis=1)]
Out[21]: array(['orange', 'tomato', 'apple', 'carrot', 'potato'], dtype=object)
注意:如果您具有所有NaN的行,那么两者都会失败,您应该将其过滤掉(例如,使用dropna
).
Note: both will fail if you have rows of all NaN, you should filter these out (e.g. with dropna
).
这篇关于使用每一行的非空值创建一个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!