如何合并字符串包含的 pandas ? [英] How to merge pandas on string contains?
问题描述
我有 2 个数据框,我想将它们合并到一个公共列上.但是,我想合并的列不是同一个字符串,而是一个字符串包含在另一个字符串中,如下所示:
I have 2 dataframes that I would like to merge on a common column. However the column I would like to merge on are not of the same string, but rather a string from one is contained in the other as so:
import pandas as pd
df1 = pd.DataFrame({'column_a':['John','Michael','Dan','George', 'Adam'], 'column_common':['code','other','ome','no match','word']})
df2 = pd.DataFrame({'column_b':['Smith','Cohen','Moore','K', 'Faber'], 'column_common':['some string','other string','some code','this code','word']})
我希望 d1.merge(d2, ...)
的结果如下:
The outcome I would like from d1.merge(d2, ...)
is the following:
column_a | column_b
----------------------
John | Moore <- merged on 'code' contained in 'some code'
Michael | Cohen <- merged on 'other' contained in 'other string'
Dan | Smith <- merged on 'ome' contained in 'some string'
George | n/a
Adam | Faber <- merged on 'word' contained in 'word'
推荐答案
新答案
这是一种基于 pandas/numpy 的方法.
New Answer
Here is one approach based on pandas/numpy.
rhs = (df1.column_common
.apply(lambda x: df2[df2.column_common.str.find(x).ge(0)]['column_b'])
.bfill(axis=1)
.iloc[:, 0])
(pd.concat([df1.column_a, rhs], axis=1, ignore_index=True)
.rename(columns={0: 'column_a', 1: 'column_b'}))
column_a column_b
0 John Moore
1 Michael Cohen
2 Dan Smith
3 George NaN
4 Adam Faber
旧答案
这是左连接行为的解决方案,因为它不会保留与任何 column_b
值不匹配的 column_a
值.这比上面的 numpy/pandas 解决方案慢,因为它使用两个嵌套的 iterrows
循环来构建 python 列表.
Old Answer
Here's a solution for left-join behaviour, as in it doesn't keep column_a
values that do not match any column_b
values. This is slower than the above numpy/pandas solution because it uses two nested iterrows
loops to build a python list.
tups = [(a1, a2) for i, (a1, b1) in df1.iterrows()
for j, (a2, b2) in df2.iterrows()
if b1 in b2]
(pd.DataFrame(tups, columns=['column_a', 'column_b'])
.drop_duplicates('column_a')
.reset_index(drop=True))
column_a column_b
0 John Moore
1 Michael Cohen
2 Dan Smith
3 Adam Faber
这篇关于如何合并字符串包含的 pandas ?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!