在 R 中创建词云时出错(simple_triplet_matrix 中的错误:'i, j, v' 不同长度) [英] Error at creating word cloud in R (Error in simple_triplet_matrix: 'i, j, v' different lengths)

查看:21
本文介绍了在 R 中创建词云时出错(simple_triplet_matrix 中的错误:'i, j, v' 不同长度)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有以下代码来获取最近关于当地市长候选人的推文并创建一个 wordcloud:

图书馆(twitteR)图书馆(ROAuth)要求(RCurl)图书馆(字符串)图书馆(tm)图书馆(ggmap)图书馆(plyr)图书馆(dplyr)图书馆(雪球C)图书馆(wordcloud)(……)setup_twitter_oauth(...)N = 10000 #推文数S = 200 #200Km 半径距离纳塔尔(覆盖整个纳塔尔地区)候选人 = '卡洛斯+爱德华多'#Lists 以便我可以在未来的代码中添加更多城市纬度 = c(-5.7792569)lons = c(-35.200916)# 获取每个城市的推文结果 = do.call(绑定,重叠(1:长度(纬度),功能(i)搜索推特(候选人,lang="pt-br",n=N,结果类型=最近",geocode=paste(lats[i], lons[i], paste0(S,"km"), sep=","))))# 获取每条推文的经纬度,# 推文本身,它被转发和收藏了多少次,# 推特的日期和时间等,并构建一个数据框.result_lat = sapply(result, function(x) as.numeric(x$getLatitude()))result_lat = sapply(result_lat, function(z) ifelse(length(z) != 0, z, NA))result_lon = sapply(result, function(x) as.numeric(x$getLongitude()))result_lon = sapply(result_lon, function(z) ifelse(length(z) != 0, z, NA))result_date = lapply(result, function(x) x$getCreated())result_date = sapply(result_date,函数(x) strftime(x, format="%d/%m/%Y %H:%M%S", tz="UTC"))result_text = sapply(result, function(x) x$getText())result_text = unlist(result_text)is_retweet = sapply(result, function(x) x$getIsRetweet())转推 = sapply(result, function(x) x$getRetweeted())retweet_count = sapply(result, function(x) x$getRetweetCount())最喜欢的计数 = sapply(结果,函数(x)x$getFavoriteCount())最喜欢的 = sapply(result, function(x) x$getFavorited())推文 = data.frame(绑定(推文 = result_text,日期 = 结果日期,纬度 = result_lat,lon = result_lon,is_retweet=is_retweet,转推 = 转推,retweet_count = retweet_count,最喜欢的_计数 = 最喜欢的_计数,收藏 = 收藏))# 世界云#Text Stemming 需要包SnowballC".#https://cran.r-project.org/web/packages/SnowballC/index.html#创建语料库语料库 = 语料库(矢量源(推文$推文))语料库 = tm_map(语料库,删除标点符号)corpus = tm_map(corpus, removeWords, stopwords('portuguese'))语料库 = tm_map(语料库,词干文档)wordcloud(语料库,max.words = 50,random.order = FALSE)

但我收到这些错误:

<块引用>

simple_triplet_matrix 中的错误(i = i, j = j, v = as.numeric(v), nrow =长度(所有条款),:

'i, j, v' 不同长度

此外:警告消息:

1: 在 doRppAPICall("search/tweets", n, params = params,retryOnRateLimit = retryOnRateLimit, :

请求了 10000 条推文,但 API 只能返回 518
#我理解这一点,我无法获得更多存在的推文

2:在 mclapply(unname(content(x)), termFreq, control) 中:全部预定内核在用户代码中遇到错误

3: 在 simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow =length(allTerms), : 强制引入的 NAs

这是我第一次构建 wordcloud,我遵循了这样的教程

I have the following code in R to get the recent tweets about the local mayor candidates and create a wordcloud:

library(twitteR)
library(ROAuth)
require(RCurl)
library(stringr)
library(tm)
library(ggmap)
library(plyr)
library(dplyr)
library(SnowballC)
library(wordcloud)
(...)
setup_twitter_oauth(...)
N = 10000 #Number of twetts
S = 200 #200Km radius from Natal (Covers the whole Natal area)
candidate = 'Carlos+Eduardo'

#Lists so I can add more cities in future codes
lats = c(-5.7792569)
lons = c(-35.200916)

# Gets the tweets from every city
result = do.call(
    rbind,
    lapply(
      1:length(lats),
      function(i) searchTwitter(
          candidate,
          lang="pt-br",
          n=N,
          resultType="recent",
          geocode=paste(lats[i], lons[i], paste0(S,"km"), sep=",")
      )
    )
  )

# Get the latitude and longitude of each tweet,
# the tweet itself, how many times it was re-twitted and favorited,
# the date and time it was twitted, etc and builds a data frame.

result_lat = sapply(result, function(x) as.numeric(x$getLatitude()))
result_lat = sapply(result_lat, function(z) ifelse(length(z) != 0, z, NA))

result_lon = sapply(result, function(x) as.numeric(x$getLongitude()))
result_lon = sapply(result_lon, function(z) ifelse(length(z) != 0, z, NA))

result_date = lapply(result, function(x) x$getCreated())
result_date = sapply(result_date,
    function(x) strftime(x, format="%d/%m/%Y %H:%M%S", tz="UTC")
  )

result_text = sapply(result, function(x) x$getText())
result_text = unlist(result_text)

is_retweet = sapply(result, function(x) x$getIsRetweet())

retweeted = sapply(result, function(x) x$getRetweeted())

retweet_count = sapply(result, function(x) x$getRetweetCount())

favorite_count = sapply(result, function(x) x$getFavoriteCount())

favorited = sapply(result, function(x) x$getFavorited())

tweets = data.frame(
    cbind(
        tweet = result_text,
        date = result_date,
        lat = result_lat,
        lon = result_lon,
        is_retweet=is_retweet,
        retweeted = retweeted,
        retweet_count = retweet_count,
        favorite_count = favorite_count,
        favorited = favorited
      )
  )

# World Cloud

#Text stemming require the package ‘SnowballC’.
#https://cran.r-project.org/web/packages/SnowballC/index.html

#Create corpus
corpus = Corpus(VectorSource(tweets$tweet))

corpus = tm_map(corpus, removePunctuation)

corpus = tm_map(corpus, removeWords, stopwords('portuguese'))

corpus = tm_map(corpus, stemDocument)

wordcloud(corpus, max.words = 50, random.order = FALSE)

But I'm getting these errors:

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :

'i, j, v' different lengths

In addition: Warning messages:

1: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :

10000 tweets were requested but the API can only return 518
#I understant this one, I cannot get more tweets that exists

2: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code

3: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : NAs introduced by coercion

It's my first time building a wordcloud and I followed tutorials like this one.

It's there a way to fix it? Another things is: the class of tweets$tweet is "factor", should I convert it or something? If yes, how I do that?

解决方案

I followed this tutorial where it's defined a function to "clean" the text and also creating a TermDocumentMatrix instead of a stemDocument before building the wordcloud. It's working properly now.

这篇关于在 R 中创建词云时出错(simple_triplet_matrix 中的错误:'i, j, v' 不同长度)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆