从文本中删除所有表情符号 [英] Removing all Emojis from Text

查看:784
本文介绍了从文本中删除所有表情符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此 Python:如何删除所有表情符号没有解决方案,我必须迈向解决方案。但需要帮助才能完成。

This question has been asked here Python : How to remove all emojis Without a solution, I have as step towards the solution. But need help finishing it off.

我去了emoji站点并从以下所有emoji十六进制代码点中获取: https://www.unicode.org/emoji/charts/emoji-ordering.txt

I went and got all the emoji hex code points from the emoji site: https://www.unicode.org/emoji/charts/emoji-ordering.txt

然后我像这样读取文件:

I then read in the file like so:

file = open('emoji-ordering.txt')
temp = file.readline()

final_list = []

while temp != '':
    #print(temp)
    if not temp[0] == '#' :
            utf_8_values = ((temp.split(';')[0]).rstrip()).split(' ')
            values = ["u\\"+(word[0]+((8 - len(word[2:]))*'0' + word[2:]).rstrip()) for word in utf_8_values]
            #print(values[0])
            final_list = final_list + values
    temp = file.readline()

print(final_list)

我希望这能给我unicode文字。并非如此,我的目标是获取Unicode文字,因此我可以使用上一个问题的部分解决方案,并能够排除所有表情符号。任何想法,我们需要什么解决方案?

I hoped this would give me unicode literals. It does not, my goal is to get unicode literals so I can use part of the solution from the last question and be able to exclude all emojis. Any ideas what we need to get a solution?

推荐答案

首先安装表情符号:

pip install emoji

pip3 install emoji

这样做:

import emoji

def give_emoji_free_text(self, text):
    allchars = [str for str in text]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

    return clean_text

text = give_emoji_free_text(text)

为我工作!

或者您可以尝试:

emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U0001F1F2-\U0001F1F4"  # Macau flag
        u"\U0001F1E6-\U0001F1FF"  # flags
        u"\U0001F600-\U0001F64F"
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U0001F1F2"
        u"\U0001F1F4"
        u"\U0001F620"
        u"\u200d"
        u"\u2640-\u2642"
        "]+", flags=re.UNICODE)

text = emoji_pattern.sub(r'', text)

这篇关于从文本中删除所有表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆