从文本中删除所有表情符号 [英] Removing all Emojis from Text
问题描述
在此 Python:如何删除所有表情符号没有解决方案,我必须迈向解决方案。但需要帮助才能完成。
This question has been asked here Python : How to remove all emojis Without a solution, I have as step towards the solution. But need help finishing it off.
我去了emoji站点并从以下所有emoji十六进制代码点中获取: https://www.unicode.org/emoji/charts/emoji-ordering.txt
I went and got all the emoji hex code points from the emoji site: https://www.unicode.org/emoji/charts/emoji-ordering.txt
然后我像这样读取文件:
I then read in the file like so:
file = open('emoji-ordering.txt')
temp = file.readline()
final_list = []
while temp != '':
#print(temp)
if not temp[0] == '#' :
utf_8_values = ((temp.split(';')[0]).rstrip()).split(' ')
values = ["u\\"+(word[0]+((8 - len(word[2:]))*'0' + word[2:]).rstrip()) for word in utf_8_values]
#print(values[0])
final_list = final_list + values
temp = file.readline()
print(final_list)
我希望这能给我unicode文字。并非如此,我的目标是获取Unicode文字,因此我可以使用上一个问题的部分解决方案,并能够排除所有表情符号。任何想法,我们需要什么解决方案?
I hoped this would give me unicode literals. It does not, my goal is to get unicode literals so I can use part of the solution from the last question and be able to exclude all emojis. Any ideas what we need to get a solution?
推荐答案
首先安装表情符号:
pip install emoji
或
pip3 install emoji
这样做:
import emoji
def give_emoji_free_text(self, text):
allchars = [str for str in text]
emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
return clean_text
text = give_emoji_free_text(text)
为我工作!
或者您可以尝试:
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U0001F1F2-\U0001F1F4" # Macau flag
u"\U0001F1E6-\U0001F1FF" # flags
u"\U0001F600-\U0001F64F"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u"\U0001F1F2"
u"\U0001F1F4"
u"\U0001F620"
u"\u200d"
u"\u2640-\u2642"
"]+", flags=re.UNICODE)
text = emoji_pattern.sub(r'', text)
这篇关于从文本中删除所有表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!