R: textrank_sentences(data = article_sentences, terminology = article_words) 中的错误:nrow(data) >1 不是真的 [英] R: Error in textrank_sentences(data = article_sentences, terminology = article_words) : nrow(data) > 1 is not TRUE

查看:59
本文介绍了R: textrank_sentences(data = article_sentences, terminology = article_words) 中的错误:nrow(data) >1 不是真的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 R 编程语言.我正在尝试使用以下网站学习如何总结文本文章:https://www.hvitfeldt.me/blog/tidy-text-summarization-using-textrank/

I am using the R programming language. I am trying to learn how to summarize text articles by using the following website: https://www.hvitfeldt.me/blog/tidy-text-summarization-using-textrank/

按照说明,我从网站上复制了代码(我使用了一些我在网上找到的随机 PDF):

As per the instructions, I copied the code from the website (I used some random PDF I found online):

library(tidyverse)
## Warning: package 'tibble' was built under R version 3.6.2
library(tidytext)
library(textrank)
library(rvest)
## Warning: package 'xml2' was built under R version 3.6.2

url <- "https://shakespeare.folger.edu/downloads/pdf/hamlet_PDF_FolgerShakespeare.pdf"


article <- read_html(url) %>%
  html_nodes('div[class="padded"]') %>%
  html_text()


article_sentences <- tibble(text = article) %>%
  unnest_tokens(sentence, text, token = "sentences") %>%
  mutate(sentence_id = row_number()) %>%
  select(sentence_id, sentence)


article_words <- article_sentences %>%
  unnest_tokens(word, sentence)


article_words <- article_words %>%
  anti_join(stop_words, by = "word")

到目前为止一切正常.

以下部分是问题所在:

 article_summary <- textrank_sentences(data = article_sentences, 
                                      terminology = article_words)

Error in textrank_sentences(data = article_sentences, terminology = article_words) : 
  nrow(data) > 1 is not TRUE

有人可以告诉我我做错了什么吗?上述过程是否不适用于pdf"?文件?

Can someone please show me what I am doing wrong? Is the above procedure not intended for "pdf" files?

这是一个可能的解决方案吗 - 如果我复制/粘贴此 pdf 中的整个文本并将其分配给文章"会怎样?对象,然后继续执行其余代码?

Is this a possible solution - what if I copy/paste the entire text from this pdf and assign it to the "article" object, and then carry on with the rest of the code?

例如文章<-等等等等.....等等等等"

谢谢

推荐答案

您共享的链接从网页读取数据.div[class="padded"] 特定于他们正在阅读的网页.它不适用于任何其他网页或您尝试从中读取数据的 pdf.您可以使用 pdftools 包从 pdf 中读取数据.

The link that you shared reads the data from a webpage. div[class="padded"] is specific to the webpage that they were reading. It will not work for any other webpage nor the pdf from which you are trying to read the data. You can use pdftools package to read data from pdf.

library(pdftools)
library(tidytext)
library(textrank)

url <- "https://shakespeare.folger.edu/downloads/pdf/hamlet_PDF_FolgerShakespeare.pdf"

article <- pdf_text(url)
article_sentences <- tibble(text = article) %>%
  unnest_tokens(sentence, text, token = "sentences") %>%
  mutate(sentence_id = row_number()) %>%
  select(sentence_id, sentence)


article_words <- article_sentences %>%
  unnest_tokens(word, sentence)


article_words <- article_words %>%
  anti_join(stop_words, by = "word")

article_summary <- textrank_sentences(data = article_sentences, terminology = article_words)

这篇关于R: textrank_sentences(data = article_sentences, terminology = article_words) 中的错误:nrow(data) >1 不是真的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆