如何在python中建立一个常规的表情符号词汇? [英] How to build a regular vocabulary of emoticons in python?

查看:55
本文介绍了如何在python中建立一个常规的表情符号词汇?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个纯文本文件 UTF32.red.codes 中的表情符号代码列表.文件的纯内容是

I have a list of codes of emoticons inside a file UTF32.red.codes in plain text. The plain content of the file is

\U0001F600
\U0001F601
\U0001F602
\U0001F603 
\U0001F604
\U0001F605
\U0001F606
\U0001F609
\U0001F60A
\U0001F60B

基于问题,我的想法是从文件的内容创建正则表达式以捕获表情符号.这是我最小的工作示例

Based on question, my idea is to create regular expression from the content of the file in order to catch emoticons. This is my minimal working example

import re

with open('UTF32.red.codes','r') as emof:
   codes = [emo.strip() for emo in emof]
   emojis = re.compile(u"(%s)" % "|".join(codes))

string = u'string to check \U0001F601'
found = emojis.findall(string)

print found

found 始终为空.我错在哪里?我正在使用 python 2.7

found is always empty. Where I am wrong? I am using python 2.7

推荐答案

此代码适用于 python 2.7

This code works with python 2.7

import re
with open('UTF32.red.codes','rb') as emof:
    codes = [emo.decode('unicode-escape').strip() for emo in emof]
    emojis = re.compile(u"(%s)" % "|".join(map(re.escape,codes)))

search = ur'string to check \U0001F601'
found = emojis.findall(search)

print found

这篇关于如何在python中建立一个常规的表情符号词汇?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆