如何使用包含多个键值的字典在python中替换字符串 [英] How to replace a string using a dictionary containing multiple values for a key in python

查看:152
本文介绍了如何使用包含多个键值的字典在python中替换字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包含Word及其最接近的相关单词的字典.

I have dictionary with Word and its closest related words.

我想用原始单词替换字符串中的相关单词. 目前,我能够替换每个键仅具有值的字符串中的单词,我无法替换具有多个值的Key的字符串. 该怎么办

I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done

示例输入

North Indian Restaurant
South India  Hotel
Mexican Restrant
Italian  Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.

词典的制作方式

y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']] 

在包含列表的数据框中包含2列

Its 2 columns inside a dataframe with lists

Orginal_Word    Related_Words
[Indian]        [India,Ind,ind.]    
[Restaurant]    [Hotel,Restrant,Hotpot]   
[Pub]           [Bar,Baar, Beer]     
[1888]          [188, 188., 18] 

词典

similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()

{'Indian ': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer'
 '1888': '188, 188., 18'}

预期产量

North Indian Restaurant
South India  Restaurant
Mexican Restaurant
Italian  Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888

感谢您的帮助

推荐答案

我认为您可以 answer 的新字典与regex :

I think you can replace by new dict with regex from this answer:

d = {'Indian': 'India, Ind, ind.',
 'Restaurant': 'Hotel, Restrant, Hotpot',
 'Pub': 'Bar, Baar, Beer',
 '1888': '188, 188., 18'}

d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}

df['col'] = df['col'].replace(d1, regex=True)
print (df)
                        col
0   North Indian Restaurant
1   South Indian Restaurant
2        Mexican Restaurant
3       Italian  Restaurant
4                  Cafe Pub
5                 Irish Pub
6               Maggiee Pub
7           Jacky Craft Pub
8               Bristo 1888
9               Bristo 1888
10              Bristo 1888

编辑(上述代码的功能):

EDIT (Function for the above code):

def replace_words(d, col):
    d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

如果出现以下错误:

regex错误-缺少),位置7处的子模式未终止

regex error- missing ), unterminated subpattern at position 7

键中必需的转义正则表达式值:

is necessary escape regex values in keys:

import re

def replace_words(d, col):
    d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
    df[col] = df[col].replace(d1, regex=True)
    return df[col]

df['col'] = replace_words(d, 'col')

这篇关于如何使用包含多个键值的字典在python中替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆