从 Python 中的字符串中删除表情符号 [英] removing emojis from a string in Python
问题描述
我在 Python 中找到了用于删除表情符号的代码,但它不起作用.你能帮忙处理其他代码或解决这个问题吗?
I found this code in Python for removing emojis but it is not working. Can you help with other codes or fix to this?
我观察到我所有的 emjois 都以 xf
开头,但是当我尝试搜索 str.startswith("xf")
时,我得到了无效字符错误.
I have observed all my emjois start with xf
but when I try to search for str.startswith("xf")
I get invalid character error.
emoji_pattern = r'/[x{1F601}-x{1F64F}]/u'
re.sub(emoji_pattern, '', word)
错误如下:
Traceback (most recent call last):
File "test.py", line 52, in <module>
re.sub(emoji_pattern,'',word)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python2.7/re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
列表中的每一项都可以是一个词 ['This', 'dog', 'xf0x9fx98x82', 'https://t.co/5N86jYipOI']代码>
Each of the items in a list can be a word ['This', 'dog', 'xf0x9fx98x82', 'https://t.co/5N86jYipOI']
更新:我使用了其他代码:
UPDATE: I used this other code:
emoji_pattern=re.compile(ur" " " [U0001F600-U0001F64F] # emoticons
|
[U0001F300-U0001F5FF] # symbols & pictographs
|
[U0001F680-U0001F6FF] # transport & map symbols
|
[U0001F1E0-U0001F1FF] # flags (iOS)
" " ", re.VERBOSE)
emoji_pattern.sub('', word)
但这仍然不会删除表情符号并显示它们!任何线索为什么会这样?
But this still doesn't remove the emojis and shows them! Any clue why is that?
推荐答案
我正在更新 @jfs 对此的回答,因为我之前的回答未能说明其他 Unicode 标准,例如拉丁语、希腊语等.StackOverFlow 不允许我删除我以前的答案,因此我正在更新它以匹配问题的最可接受的答案.
I am updating my answer to this by @jfs because my previous answer failed to account for other Unicode standards such as Latin, Greek etc. StackOverFlow doesn't allow me to delete my previous answer hence I am updating it to match the most acceptable answer to the question.
#!/usr/bin/env python
import re
text = u'This is a smiley face U0001f602'
print(text) # with emoji
def deEmojify(text):
regrex_pattern = re.compile(pattern = "["
u"U0001F600-U0001F64F" # emoticons
u"U0001F300-U0001F5FF" # symbols & pictographs
u"U0001F680-U0001F6FF" # transport & map symbols
u"U0001F1E0-U0001F1FF" # flags (iOS)
"]+", flags = re.UNICODE)
return regrex_pattern.sub(r'',text)
print(deEmojify(text))
这是我之前的回答,不要用这个.
This was my previous answer, do not use this.
def deEmojify(inputString):
return inputString.encode('ascii', 'ignore').decode('ascii')
这篇关于从 Python 中的字符串中删除表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!