从Python中的字符串中删除表情符号 [英] removing emojis from a string in Python

查看:696
本文介绍了从Python中的字符串中删除表情符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python中找到了此代码,用于删除表情符号,但无法正常工作。您可以提供其他代码帮助或解决此问题吗?

I found this code in Python for removing emojis but it is not working. Can you help with other codes or fix to this?

我观察到所有emjois都以 \xf 开头,但是当我尝试搜索 str.startswith( \xf)我收到无效的字符错误。

I have observed all my emjois start with \xf but when I try to search for str.startswith("\xf") I get invalid character error.

emoji_pattern = r'/[x{1F601}-x{1F64F}]/u'
re.sub(emoji_pattern, '', word)

以下是错误:

Traceback (most recent call last):
  File "test.py", line 52, in <module>
    re.sub(emoji_pattern,'',word)
  File "/usr/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

列表中的每个项目都可以是单词 ['This','dog','\xf0\x9f\x98\x82','https://t.co/5N86jYipOI']

Each of the items in a list can be a word ['This', 'dog', '\xf0\x9f\x98\x82', 'https://t.co/5N86jYipOI']

更新:
我使用了以下其他代码:

UPDATE: I used this other code:

emoji_pattern=re.compile(ur" " " [\U0001F600-\U0001F64F] # emoticons \
                                 |\
                                 [\U0001F300-\U0001F5FF] # symbols & pictographs\
                                 |\
                                 [\U0001F680-\U0001F6FF] # transport & map symbols\
                                 |\
                                 [\U0001F1E0-\U0001F1FF] # flags (iOS)\
                          " " ", re.VERBOSE)

emoji_pattern.sub('', word)

但这仍然不会删除表情符号并显示它们!有什么线索吗?

But this still doesn't remove the emojis and shows them! Any clue why is that?

推荐答案

我正在通过@jfs更新对此的回答因为我以前的答案未能说明其他Unicode标准(例如拉丁语,希腊语等),所以StackOverFlow不允许我删除以前的答案,因此我正在对其进行更新以使其与该问题的最可接受答案相匹配。

I am updating my answer to this by @jfs because my previous answer failed to account for other Unicode standards such as Latin, Greek etc. StackOverFlow doesn't allow me to delete my previous answer hence I am updating it to match the most acceptable answer to the question.

#!/usr/bin/env python
import re

text = u'This is a smiley face \U0001f602'
print(text) # with emoji

def deEmojify(text):
    regrex_pattern = re.compile(pattern = "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags = re.UNICODE)
    return regrex_pattern.sub(r'',text)

print(deEmojify(text))

这是我以前的回答,请不要使用。

This was my previous answer, do not use this.

def deEmojify(inputString):
    return inputString.encode('ascii', 'ignore').decode('ascii')

这篇关于从Python中的字符串中删除表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆