如何使用httr GET命令刷新或重试特定网页? [英] How to refresh or retry a specific web page using httr GET command?

查看:248
本文介绍了如何使用httr GET命令刷新或重试特定网页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个关键字列表

我需要使用不同的关键字访问相同的网页来获取它提供的特定内容。 x ,我使用 httr 包中的 GET 命令访问网页,然后检索我需要的信息 y

  library(httr)
library(stringr)
library(XML)

for(i in 1:20){
h1 = GET(paste0(http:.... categories = & query =,x [i]),timeout(10))
par = htmlParse(file = h1)

y [i] = xpathSApply(doc = par,path = // h3 / a,fun = xmlValue)

}

问题是经常会达到超时,并且会中断循环。



所以我想刷新网页或者在超时时间内重试GET命令,因为我怀疑问题出在我尝试访问的网站的互联网连接上。

我的代码工作方式是超时打破循环。我需要忽略错误并进入下一次迭代或重试访问网站。 C> purrr ::安全()。你可以这样包装 GET

  safe_GET<  -  purrr: :safe(GET)

这消除了 tryCatch() code>让你做:

  resp<  -  safe_GET(http://example.com) #你可以使用所有合法的`GET`参数

你可以测试 resp $ result 用于 NULL 。把它放到你的重试循环中,你就可以开始行动了。



你可以通过下面的行为来看到这一点:

  str(safe_GET(https://httpbin.org/delay/3,timeout(1)))

这将要求httpbin服务在响应之前等待3秒,但是将 GET 请求的显式超时设置为1秒。我将它封装在 str()中以显示结果:

 列表of 2 
$ result:NULL
$ error:2
列表$ message:chr超时已到达
.. $ call:语言卷曲:: curl_fetch_memory(curl_fetch_memory url,handle = handle)
..- attr(*,class)= chr [1:3]simpleErrorerrorcondition
pre>

所以,如果需要的话,你甚至可以检查邮件。


I need to access the same web page with different "keys" to get specific content it provides.

I have a list of keys x and I use the GET command from httr package to access the web page and then retrieve the information I need y.

library(httr)
library(stringr)
library(XML)

for (i in 1:20){
    h1 = GET ( paste0("http:....categories=&query=", x[i]),timeout(10))
    par = htmlParse(file = h1)

    y[i]=xpathSApply(doc = par, path = "//h3/a" , fun=xmlValue)

}

The problem is that timeout is often reached, and it disrupts the loop.

So I would like to refresh the web page or retry the GET command if timeout is reached, because I suspect the problem is with the internet connection of the website I am trying to access.

The way my code works, timeout breaks the loop. I need to either ignore the error and go to next iteration or retry to access the website.

解决方案

Look at purrr::safely(). You can wrap GET as such:

safe_GET <- purrr::safely(GET)

This removes the ugliness of tryCatch() by letting you do:

resp <- safe_GET("http://example.com") # you can use all legal `GET` params

And you can test resp$result for NULL. Put that into your retry loop and you're good to go.

You can see this in action by doing:

str(safe_GET("https://httpbin.org/delay/3", timeout(1)))

which will ask the httpbin service to wait 3s before responding but set an explicit timeout on the GET request to 1s. I wrapped it in str() to show the result:

List of 2
 $ result: NULL
 $ error :List of 2
  ..$ message: chr "Timeout was reached"
  ..$ call   : language curl::curl_fetch_memory(url, handle = handle)
  ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

So, you can even check the message if you need to.

这篇关于如何使用httr GET命令刷新或重试特定网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆