Python正则表达式:查找单词和表情 [英] Python regex: find words and emoticons
问题描述
我想找到一条推文和包含单词,短语和表情符号的字符串列表之间的匹配项.这是我的代码:
I want to find matches between a tweet and a list of strings containing words, phrases, and emoticons. Here is my code:
words = [':)','and i','sleeping','... :)','! <3','facebook']
regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
words = [':)','and i','sleeping','... :)','! <3','facebook']
regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
我一直收到此错误:
error: unbalanced parenthesis
显然,代码有问题,它不能匹配表情符号.知道如何解决吗?
Apparently there is something wrong with the code and it cannot match emoticons. Any idea how to fix it?
推荐答案
re模块具有函数escape
,该函数负责正确地转义单词,因此您可以使用
The re module has a function escape
that takes care of correct escaping of words, so you could just use
words = map(re.escape, [':)','and i','sleeping','... :)','! <3','facebook'])
请注意,当单词边界与以实际单词字符开头或结尾不存在的单词一起使用时,单词边界可能无法按预期工作.
Note that word boundaries might not work as you expect when used with words that don't start or end with actual word characters.
这篇关于Python正则表达式:查找单词和表情的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!