匹配两个数据框之间的字符串并创建列 [英] Match strings between two dataframes and create column

查看:68
本文介绍了匹配两个数据框之间的字符串并创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将字符串从bad_boy匹配到good_boy的部分,并在原始df(bad_boy)中创建称为Right Address的列,但是很难做到这一点.我看了下面的链接:

I am trying to match parts of string from bad_boy to good_boy and create a column in the original df (bad_boy) called the Right Address but having hard time getting this accomplished. I have looked at the links below:

如果在熊猫中包含子字符串,请替换整个字符串

使用部分字符串匹配返回DataFrame项在大熊猫python上

import pandas as pd
bad_boy = pd.read_excel('C:/Users/Programming/.xlsx')
df = pd.DataFrame(bad_boy)

print (df['Address'].head(3))

0  1234 Stack Overflow
1  7458 Python
2  8745 Pandas

good_boy = pd.read_excel('C:/Users/Programming/.xlsx')

df2 = pd.DataFrame(good_boy)

print (df2['Address'].head(10))

0 5896 Java Road
1 1234 Stack Overflow Way
2 7459 Ruby Drive
3 4517 Numpy Creek Way
4 1642 Scipy Trail
5 7458 Python Avenue
6 8745 Pandas Lane
7 9658 Excel Road
8 7255 Html Drive
9 7459 Selenium Creek Way

我尝试过:

df['Right Address'] = df.loc[df['Address'].str.contains('Address', case = False, na = False, regex = False), df2['Address']]

但这会引发错误:

'None of [0.....all addresses\nName: Address, dtype: object] are in the [columns]'

请求结果:

print (df['Right Address'].head(3))

0  1234 Stack Overflow Way
1  7458 Python Avenue
2  8745 Pandas Lane

推荐答案

您可以将合并与str.extract结合使用以进行部分匹配

You can use merge combined with str.extract for partial match

df1 = df1.merge(df2, left_on = df1.Address.str.extract('(\d+)', expand = False), right_on = df2.Address.str.extract('(\d+)', expand = False), how = 'inner').rename(columns = {'Address_y': 'Right_Address'})

你得到

    Address_x           Right_Address
0   1234 Stack Overflow 1234 Stack Overflow Way
1   7458 Python         7458 Python Avenue
2   8745 Pandas         8745 Pandas Lane

这篇关于匹配两个数据框之间的字符串并创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆