合并缺少值的 pandas 字符串列 [英] Combine pandas string columns with missing values
本文介绍了合并缺少值的 pandas 字符串列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要在熊猫数据框的两列或更多列中连接字符串.
I need to concat the strings in 2 or more columns of a pandas dataframe.
我找到了这个 answer ,如果您没有任何缺失的价值,它会很好地工作.不幸的是,我有,这导致诸如"ValueA; None"之类的事情,这并不是很干净.
I found this answer, which works fine if you don't have any missing value. Unfortunately, I have, and this leads to things like "ValueA; None", which is not really clean.
示例数据:
col_A | col_B
------ | ------
val_A | val_B
None | val_B
val_A | None
None | None
我需要这个结果:
col_merge
---------
val_A;val_B
val_B
val_A
None
推荐答案
您可以将apply
与if-else
一起使用:
df = df.apply(lambda x: None if x.isnull().all() else ';'.join(x.dropna()), axis=1)
print (df)
0 val_A;val_B
1 val_B
2 val_A
3 None
dtype: object
可能需要使用更快的解决方案:
For faster solution is possible use:
#add separator and replace NaN to empty space
#convert to lists
arr = df.add('; ').fillna('').values.tolist()
#list comprehension, replace empty spaces to NaN
s = pd.Series([''.join(x).strip('; ') for x in arr]).replace('^$', np.nan, regex=True)
#replace NaN to None
s = s.where(s.notnull(), None)
print (s)
0 val_A;val_B
1 val_B
2 val_A
3 None
dtype: object
#40000 rows
df = pd.concat([df]*10000).reset_index(drop=True)
In [70]: %%timeit
...: arr = df.add('; ').fillna('').values.tolist()
...: s = pd.Series([''.join(x).strip('; ') for x in arr]).replace('^$', np.nan, regex=True)
...: s.where(s.notnull(), None)
...:
10 loops, best of 3: 74 ms per loop
In [71]: %%timeit
...: df.apply(lambda x: None if x.isnull().all() else ';'.join(x.dropna()), axis=1)
...:
1 loop, best of 3: 12.7 s per loop
#another solution, but slowier a bit
In [72]: %%timeit
...: arr = df.add('; ').fillna('').values
...: s = [''.join(x).strip('; ') for x in arr]
...: pd.Series([y if y != '' else None for y in s])
...:
...:
10 loops, best of 3: 119 ms per loop
这篇关于合并缺少值的 pandas 字符串列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文