如何使用字典替换 Pandas 系列中的多个子字符串? [英] How to replace multiple substrings in a Pandas series using a dictionary?

查看:76
本文介绍了如何使用字典替换 Pandas 系列中的多个子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas 系列的字符串.我想对每行的多个子字符串进行多次替换,请参阅:

I have a Pandas series of strings. I want to make multiple replacements to multiple substrings per row, see:

testdf = pd.Series([
    'Mary went to school today',
    'John went to hospital today'
])
to_sub = {
    'Mary': 'Alice',
    'school': 'hospital',
    'today': 'yesterday',
    'tal': 'zzz',
}
testdf = testdf.replace(to_sub, regex=True)  # does not work (only replaces one instance per row)
print(testdf)

在上述情况下,所需的输出是:

In the above case, the desired output is:

Alice went to hospital yesterday.
John went to hospizzz yesterday.

注意第一行有三个来自字典的替换.

where note the first row had three substitutions from the dictionary.

除了逐行执行此操作(在 for 循环中)之外,我如何有效地执行此操作?

How can I perform this efficiently apart from doing this row by row (in a for loop)?

我尝试了 df.replace(...) 和其他问题中的许多其他答案,但只替换了一个子字符串,结果如下:Alice 今天去上学了code>,其中 schooltoday 没有被替换..

I tried df.replace(...) as many other answers in other questions but that only replaces a single substring, the result is like: Alice went to school today, where school and today weren't substituted..

另一件需要注意的事情是,替换应该一次性发生在任何一行.(请参阅第一行中的 hospital 没有将 第二次 替换为 hospizzz,这将是错误).

Another thing to note is that the substitutions should happen all at once for any single row. (see the hospital in the first row isn't substituted a second time to hospizzz which would be wrong).

推荐答案

您可以使用:

#Borrowed from an external website
def multipleReplace(text, wordDict):
    for key in wordDict:
        text = text.replace(key, wordDict[key])
    return text

print(testdf.apply(lambda x: multipleReplace(x,to_sub)))

0    Alice went to hospital yesterday
1     John went to hospital yesterday

编辑

使用字典作为下面提到的评论:

Using the dictionary as below mentioned comments:

to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz'
}

testdf.apply(lambda x: ' '.join([to_sub.get(i, i) for i in x.split()]))

输出:

0    Alice went to hospital yesterday
1     John went to hospital yesterday

这篇关于如何使用字典替换 Pandas 系列中的多个子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆