从python列表中删除字符串中所有出现的单词 [英] Remove all occurrences of words in a string from a python list

查看:100
本文介绍了从python列表中删除字符串中所有出现的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用编译的正则表达式从字符串中匹配和删除列表中的所有单词,但我正在努力避免在单词中出现.

I'm trying to match and remove all words in a list from a string using a compiled regex but I'm struggling to avoid occurrences within words.

当前:

 REMOVE_LIST = ["a", "an", "as", "at", ...]

 remove = '|'.join(REMOVE_LIST)
 regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
 out = regex.sub("", text)

在:敏捷的棕色狐狸跳过一只蚂蚁"

出:快速的棕色狐狸跳过了t"

预期:快速的棕色狐狸跳过"

我尝试将字符串更改为以下内容,但无济于事:

I've tried changing the string to compile to the following but to no avail:

 regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

有什么建议还是我遗漏了一些花哨的东西?

Any suggestions or am I missing something garishly obvious?

推荐答案

一个问题是只有第一个 \b 在原始字符串中.第二个被解释为退格字符 (ASCII 8) 而不是单词边界.

One problem is that only the first \b is inside a raw string. The second gets interpreted as the backspace character (ASCII 8) rather than as a word boundary.

要修复,请更改

regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)
                                 ^ THIS

这篇关于从python列表中删除字符串中所有出现的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆