Python正则表达式:查找单词和表情 [英] Python regex: find words and emoticons

查看:229
本文介绍了Python正则表达式:查找单词和表情的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到一条推文和包含单词,短语和表情符号的字符串列表之间的匹配项.这是我的代码:

I want to find matches between a tweet and a list of strings containing words, phrases, and emoticons. Here is my code:

words = [':)','and i','sleeping','... :)','! <3','facebook'] regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)

words = [':)','and i','sleeping','... :)','! <3','facebook'] regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)

我一直收到此错误:

error: unbalanced parenthesis

显然,代码有问题,它不能匹配表情符号.知道如何解决吗?

Apparently there is something wrong with the code and it cannot match emoticons. Any idea how to fix it?

推荐答案

re模块具有函数escape,该函数负责正确地转义单词,因此您可以使用

The re module has a function escape that takes care of correct escaping of words, so you could just use

words = map(re.escape, [':)','and i','sleeping','... :)','! <3','facebook'])

请注意,当单词边界与以实际单词字符开头或结尾不存在的单词一起使用时,单词边界可能无法按预期工作.

Note that word boundaries might not work as you expect when used with words that don't start or end with actual word characters.

这篇关于Python正则表达式:查找单词和表情的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆