如何使用包含多个键值的字典在python中替换字符串 [英] How to replace a string using a dictionary containing multiple values for a key in python
问题描述
我有包含Word及其最接近的相关单词的字典.
I have dictionary with Word and its closest related words.
我想用原始单词替换字符串中的相关单词. 目前,我能够替换每个键仅具有值的字符串中的单词,我无法替换具有多个值的Key的字符串. 该怎么办
I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done
示例输入
North Indian Restaurant
South India Hotel
Mexican Restrant
Italian Hotpot
Cafe Bar
Irish Pub
Maggiee Baar
Jacky Craft Beer
Bristo 1889
Bristo 188
Bristo 188.
词典的制作方式
y= list(word)
words = y
similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words]
similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar})
similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']]
在包含列表的数据框中包含2列
Its 2 columns inside a dataframe with lists
Orginal_Word Related_Words
[Indian] [India,Ind,ind.]
[Restaurant] [Hotel,Restrant,Hotpot]
[Pub] [Bar,Baar, Beer]
[1888] [188, 188., 18]
词典
similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()
{'Indian ': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer'
'1888': '188, 188., 18'}
预期产量
North Indian Restaurant
South India Restaurant
Mexican Restaurant
Italian Restaurant
Cafe Pub
Irish Pub
Maggiee Pub
Jacky Craft Pub
Bristo 1888
Bristo 1888
Bristo 1888
感谢您的帮助
推荐答案
我认为您可以 answer 的新字典与regex
:
I think you can replace
by new dict with regex
from this answer:
d = {'Indian': 'India, Ind, ind.',
'Restaurant': 'Hotel, Restrant, Hotpot',
'Pub': 'Bar, Baar, Beer',
'1888': '188, 188., 18'}
d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df['col'] = df['col'].replace(d1, regex=True)
print (df)
col
0 North Indian Restaurant
1 South Indian Restaurant
2 Mexican Restaurant
3 Italian Restaurant
4 Cafe Pub
5 Irish Pub
6 Maggiee Pub
7 Jacky Craft Pub
8 Bristo 1888
9 Bristo 1888
10 Bristo 1888
编辑(上述代码的功能):
EDIT (Function for the above code):
def replace_words(d, col):
d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
如果出现以下错误:
regex错误-缺少),位置7处的子模式未终止
regex error- missing ), unterminated subpattern at position 7
键中必需的转义正则表达式值:
is necessary escape regex values in keys:
import re
def replace_words(d, col):
d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}
df[col] = df[col].replace(d1, regex=True)
return df[col]
df['col'] = replace_words(d, 'col')
这篇关于如何使用包含多个键值的字典在python中替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!