用NaN值迭代连接 pandas 中的列 [英] Iteratively concatenate columns in pandas with NaN values
问题描述
我有一个pandas.DataFrame
数据帧:
import pandas as pd
df = pd.DataFrame({"x": ["hello there you can go home now", "why should she care", "please sort me appropriately"],
"y": [np.nan, "finally we were able to go home", "but what about meeeeeeeeeee"],
"z": ["", "alright we are going home now", "ok fine shut up already"]})
cols = ["x", "y", "z"]
我想迭代地连接这些列,而不是像这样写:
I want to iteratively concatenate these columns, as opposed to writing something like:
df["concat"] = df["x"].str.cat(df["y"], sep = " ").str.cat(df["z"], sep = " ")
我知道将三列组合在一起似乎微不足道,但实际上我有30列.所以,我想做些类似的事情:
I know that three columns seems trivial to put together, but I actually have 30. so, I would like to do something like:
df["concat"] = df[cols[0]]
for i in range(1, len(cols)):
df["concat"] = df["concat"].str.cat(df[cols[i]], sep = " ")
现在,最初的df["concat"] = df[cols[0]]
行工作正常,但是位置df.loc[1, "y"]
中的NaN
值弄乱了串联.最终,由于该一个空值,整个1
行在df["concat"]
中最终以NaN
结尾.我该如何解决?我需要指定pd.Series.str.cat
的某些选项吗?
Right now, the initial df["concat"] = df[cols[0]]
line works fine, but the NaN
value in location df.loc[1, "y"]
messes up the concatenation. Ultimately, the entire 1
st row ends up as NaN
in df["concat"]
due to this one null value. How can I get around this? Is there some option with pd.Series.str.cat
I need to specify?
推荐答案
选项1
Option 1
pd.Series(df.fillna('').values.tolist()).str.join(' ')
0 hello there you can go home now
1 why should she care finally we were able to go...
2 please sort me appropriately but what about me...
dtype: object
选项2
Option 2
df.fillna('').add(' ').sum(1).str.strip()
0 hello there you can go home now
1 why should she care finally we were able to go...
2 please sort me appropriately but what about me...
dtype: object
这篇关于用NaN值迭代连接 pandas 中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!