带有表情符号的 R 推文 [英] R tweets with emojis
问题描述
我从 twitter API 和包 rtweet
中删除了推文,但我不知道如何使用表情符号处理文本,因为它们的形式是\U0001f600"和所有的正则表达式代码直到现在我都尝试失败了.我什么也得不到.
I scrapped tweets from the twitter API and the package rtweet
but I don't know how to work with text with emojis because they are in the form '\U0001f600' and all the regex code that I tried failed until now. I can't get anything of it.
例如
text = 'text text. \U0001f600'
grepl('U',text)
给我假的
grepl('000',text)
也给我 FALSE.
另一个问题是他们经常粘在前面的词上(例如i am here\U0001f600
)
Another problem is that they are often sticked to the word before (for example i am here\U0001f600
)
那么我怎样才能让 R 识别那种格式的表情符号呢?对于该格式的任何表情符号,我可以在 grepl 中放入什么内容,以便为我返回 TRUE?
So how can I make R recognize emojis of that format? What can I put in the grepl that will return me TRUE for any emojis of that format?
推荐答案
在 R 中,大多数事情都有一个包.在这种情况下 textclean
随之而来的是 lexicon
包,里面有很多词典.使用 textclean,您可以使用 2 个函数,replace_emoji
和 replace_emoji_identifier
In R there tends to be a package for most things. And in this case textclean
and with it comes the lexicon
package which has a lot of dictionaries. Using textclean you have 2 functions you can use, replace_emoji
and replace_emoji_identifier
text = c("text text. \U0001f600", "i am here\U0001f600")
# replace emoji with identifier:
textclean::replace_emoji_identifier(text)
[1] "text text. lexiconvygwtlyrpywfarytvfis " "i am here lexiconvygwtlyrpywfarytvfis "
# replace emoji with text representation
textclean::replace_emoji(text)
[1] "text text. grinning face " "i am here grinning face "
接下来,您可以使用 sentimentr
对表情符号进行情感评分或用于文本分析 quanteda
.如果您只想检查预期输出中的存在:
Next you could use sentimentr
to use sentiment scoring on the emoji's or for text analysis quanteda
. If you just want to check the presence as in your expected output:
grepl("lexicon[[:alpha:]]{20}", textclean::replace_emoji_identifier(text))
[1] TRUE TRUE
这篇关于带有表情符号的 R 推文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!