带有表情符号的 R 推文 [英] R tweets with emojis

查看:47
本文介绍了带有表情符号的 R 推文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 twitter API 和包 rtweet 中删除了推文,但我不知道如何使用表情符号处理文本,因为它们的形式是\U0001f600"和所有的正则表达式代码直到现在我都尝试失败了.我什么也得不到.

I scrapped tweets from the twitter API and the package rtweet but I don't know how to work with text with emojis because they are in the form '\U0001f600' and all the regex code that I tried failed until now. I can't get anything of it.

例如

 text = 'text text. \U0001f600'
 grepl('U',text)

给我假的

 grepl('000',text)

也给我 FALSE.

另一个问题是他们经常粘在前面的词上(例如i am here\U0001f600)

Another problem is that they are often sticked to the word before (for example i am here\U0001f600 )

那么我怎样才能让 R 识别那种格式的表情符号呢?对于该格式的任何表情符号,我可以在 grepl 中放入什么内容,以便为我返回 TRUE?

So how can I make R recognize emojis of that format? What can I put in the grepl that will return me TRUE for any emojis of that format?

推荐答案

在 R 中,大多数事情都有一个包.在这种情况下 textclean随之而来的是 lexicon 包,里面有很多词典.使用 textclean,您可以使用 2 个函数,replace_emojireplace_emoji_identifier

In R there tends to be a package for most things. And in this case textclean and with it comes the lexicon package which has a lot of dictionaries. Using textclean you have 2 functions you can use, replace_emoji and replace_emoji_identifier

text = c("text text. \U0001f600", "i am here\U0001f600")

# replace emoji with identifier:
textclean::replace_emoji_identifier(text)
[1] "text text. lexiconvygwtlyrpywfarytvfis " "i am here lexiconvygwtlyrpywfarytvfis " 

# replace emoji with text representation
textclean::replace_emoji(text)
[1] "text text. grinning face " "i am here grinning face " 

接下来,您可以使用 sentimentr 对表情符号进行情感评分或用于文本分析 quanteda.如果您只想检查预期输出中的存在:

Next you could use sentimentr to use sentiment scoring on the emoji's or for text analysis quanteda. If you just want to check the presence as in your expected output:

grepl("lexicon[[:alpha:]]{20}", textclean::replace_emoji_identifier(text))
[1] TRUE TRUE

这篇关于带有表情符号的 R 推文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆