如何在 R 中的 read_html 后关闭未使用的连接 [英] How do I close unused connections after read_html in R

查看:51
本文介绍了如何在 R 中的 read_html 后关闭未使用的连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 R 很陌生,正在尝试访问 Internet 上的一些信息,但我遇到了似乎没有关闭的连接问题.如果这里有人能给我一些建议,我将不胜感激...

最初我想使用 WebChem 包,理论上它可以提供我想要的一切,但是当网页中缺少某些输出数据时,WebChem 不会从该页面返回任何数据.为了解决这个问题,我从包中提取了大部分代码,但稍微修改了它以满足我的需要.这工作得很好,大约前 150 次使用,但现在,虽然我没有改变任何东西,当我使用命令 read_html 时,我收到警告消息关闭未使用的连接 4 (http:....."虽然这只是警告消息,生成此警告后 read_html 不会返回任何内容.

我写了一个简化的代码,如下所示.这有同样的问题

完全关闭 R(甚至重新启动我的 PC)似乎没有什么区别 - 现在第二次使用代码时会出现警告消息.我可以一次运行一个查询,在循环之外没有问题,但是一旦我尝试使用循环,错误就会在第二次迭代中再次发生.我试图对代码进行矢量化,但它再次返回了相同的错误消息.我尝试了 showConnections(all=TRUE),但只有 0-2 的连接用于标准输入、标准输出、标准错误.我尝试寻找关闭 html 连接的方法,但我无法将 url 定义为 con,并且 close(qurl) 和 close(ttt) 也不起作用.(分别返回应用于字符"类对象的关闭"没有适用方法的错误,以及应用于c(‘xml_document’,‘xml_node’)"类对象的关闭"没有适用方法的返回错误)

有没有人知道关闭这些连接的方法,这样它们就不会破坏我的日常工作?任何建议将非常受欢迎.谢谢!

PS:我使用 R 版本 3.3.0 和 RStudio 版本 0.99.902.

CasNrs <- c("630-08-0","463-49-0","194-59-2","86-74-8","148-79-8")山雀 = 字符()for (i in 1:length(CasNrs)){CurrCasNr <- as.character(CasNrs[i])baseurl <-'http://chem.sis.nlm.nih.gov/chemidplus/rn/'qurl <- paste0(baseurl, CurrCasNr, '?DT_START_ROW=0&DT_ROWS_PER_PAGE=50')ttt <- 尝试(read_html(qurl),无声=真)山雀[i] <- xml_text(xml_find_all(ttt, "//head/title"))}

解决方案

对于这个问题,我还没有找到好的答案.我想出的最佳解决方法是包含下面的函数,Secs = 3 或 4.我仍然不知道为什么会出现问题,或者如何在不长时间构建的情况下阻止它.

CatchupPause <- function(Secs){Sys.sleep(Secs) #pause 让连接工作关闭所有连接()GC()}

I am quite new to R and am trying to access some information on the internet, but am having problems with connections that don't seem to be closing. I would really appreciate it if someone here could give me some advice...

Originally I wanted to use the WebChem package, which theoretically delivers everything I want, but when some of the output data is missing from the webpage, WebChem doesn't return any data from that page. To get around this, I have taken most of the code from the package but altered it slightly to fit my needs. This worked fine, for about the first 150 usages, but now, although I have changed nothing, when I use the command read_html, I get the warning message " closing unused connection 4 (http:....." Although this is only a warning message, read_html doesn't return anything after this warning is generated.

I have written a simplified code, given below. This has the same problem

Closing R completely (or even rebooting my PC) doesn't seem to make a difference - the warning message now appears the second time I use the code. I can run the querys one at a time, outside of the loop with no problems, but as soon as I try to use the loop, the error occurs again on the 2nd iteration. I have tried to vectorise the code, and again it returned the same error message. I tried showConnections(all=TRUE), but only got connections 0-2 for stdin, stdout, stderr. I have tried searching for ways to close the html connection, but I can't define the url as a con, and close(qurl) and close(ttt) also don't work. (Return errors of no applicable method for 'close' applied to an object of class "character and no applicable method for 'close' applied to an object of class "c('xml_document', 'xml_node')", repectively)

Does anybody know a way to close these connections so that they don't break my routine? Any suggestions would be very welcome. Thanks!

PS: I am using R version 3.3.0 with RStudio Version 0.99.902.

CasNrs <- c("630-08-0","463-49-0","194-59-2","86-74-8","148-79-8")
tit = character()
for (i in 1:length(CasNrs)){
  CurrCasNr <- as.character(CasNrs[i])
  baseurl <- 'http://chem.sis.nlm.nih.gov/chemidplus/rn/'
  qurl <- paste0(baseurl, CurrCasNr, '?DT_START_ROW=0&DT_ROWS_PER_PAGE=50')
  ttt <- try(read_html(qurl), silent = TRUE)
  tit[i] <- xml_text(xml_find_all(ttt, "//head/title"))
}

解决方案

I haven't found a good answer for this problem. The best work-around that I came up with is to include the function below, with Secs = 3 or 4. I still don't know why the problem occurs or how to stop it without building in a large delay.

CatchupPause <- function(Secs){
 Sys.sleep(Secs) #pause to let connection work
 closeAllConnections()
 gc()
}

这篇关于如何在 R 中的 read_html 后关闭未使用的连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆