为数据帧的每一行提取情绪计算 [英] Extract emotions calculation for every row of a dataframe

查看:28
本文介绍了为数据帧的每一行提取情绪计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有文本行的数据框.我想为每一行文本提取一个特定情绪的向量,该向量将是一个二进制 0 不存在此情绪或存在 1.
总共有 5 种情绪,但我只想将 1 用于似乎是最多的情感.

I have a dataframe with rows of text. I would like to extract for each row of text a vector of specific emotion which will be a binary 0 is not exist this emotion or 1 is exist.
Totally they are 5 emotions but I would like to have the 1 only for the emotion which seem to be the most.

我尝试过的示例:

library(tidytext)
text = data.frame(id = c(11,12,13), text=c("bad movie","good movie","I think it would benefit religious people to see things like this, not just to learn about our home, the Universe, in a fun and easy way, but also to understand that non- religious explanations don't leave people hopeless and",))
nrc_lexicon <- get_sentiments("nrc")

预期输出示例:

    id text sadness anger joy love neutral
11 "bad movie" 1 0 0 0 0
12 "good movie" 0 0 1 0 0 

任何提示都会对我有所帮助.

Any hints will be helpful for me.

为每一行制作它的示例下一步是什么?
如何使用 nrc 词典分析调用每一行?

Example to make it for every row what is the next step?
How can I call every line with the nrc lexicon analysis?

for (i in 1:nrow(text)) {
(text$text[i], nrc_lexicon)
}

推荐答案

这个怎么样:

library(tidytext)   # library for text
library(dplyr)

# your data
text <- data.frame(id = c(11,12,13),
 text=c("bad movie","good movie","I think it would benefit religious
 people to see things like this, not just to learn about our home, 
the Universe, in a fun and easy way, but also to understand that non- religious
 explanations don't leave people hopeless and"), stringsAsFactors = FALSE)  # here put this option, stringAsFactors = FALSE!

# the lexicon
nrc_lexicon <- get_sentiments("nrc")

# now the job
unnested <- text %>%
             unnest_tokens(word, text) %>%  # unnest the words
             left_join(nrc_lexicon) %>%     # join with the lexicon to have sentiments
             left_join(text)                # join with your data to have titles

这里是带有id的输出,你也可以有标题,但我没有把它放在第三个标题长的地方,你可以很容易地把它作为unnested$text 代替 unnested$id:

Here the output with the id, you can have it also with the titles, but I did not put it due the long third title, you can easily put it as unnested$text in place of unnested$id:

table_sentiment <- table(unnested$id, unnested$sentiment)
table_sentiment
     anger anticipation disgust fear joy negative positive sadness surprise trust
  11     1            0       1    1   0        1        0       1        0     0
  12     0            1       0    0   1        0        1       0        1     1
  13     0            1       0    1   1        2        3       2        1     0

如果你想要它作为 data.frame:

 df_sentiment <- as.data.frame.matrix(table_sentiment)

现在你可以做任何你想做的事情,例如,如果我没记错的话,你想要一个二进制输出,如果存在或不存在一种情绪:

Now you can do everything you want, for example, if I remember well, you want a binary output if exist or not a sentiment:

df_sentiment[df_sentiment>1]<-1
df_sentiment
   anger anticipation disgust fear joy negative positive sadness surprise trust
11     1            0       1    1   0        1        0       1        0     0
12     0            1       0    0   1        0        1       0        1     1
13     0            1       0    1   1        1        1       1        1     0

这篇关于为数据帧的每一行提取情绪计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆