包装“rvest"用于带有代理的网页抓取 https 站点 [英] Package "rvest" for web scraping https site with proxy

查看：48 发布时间：2021/7/14 18:33:13 r web-scraping rvest

本文介绍了包装“rvest"用于带有代理的网页抓取 https 站点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想废弃一个 https 网站，但我失败了.

I want to scrap a https website, but I failed.

这是我的代码:

require(rvest)
url <- "https://www.sunnyplayer.com/de/"
content <- read_html(url)

但我在控制台中有错误-open.connection(x, "rb") 中的错误:已达到超时"我该如何解决这个问题?

But I have error in console- "Error in open.connection(x, "rb") : Timeout was reached" How I can fix this problem?

推荐答案

同样的事情发生在我的代理上.要解决此问题，请使用 download.file 并指定下载位置.然后，您可以使用 read_html 解析该文件.

The same thing happens to me on a proxy. To get around this, use download.file and specify a download location. You can then parse the file with read_html.

download.file(url, destfile = 'C://whatever.html')
content <- read_html('C://whatever.html')

这篇关于包装“rvest"用于带有代理的网页抓取 https 站点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

包装“rvest"用于带有代理的网页抓取 https 站点 [英] Package "rvest" for web scraping https site with proxy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

包装“rvest"用于带有代理的网页抓取 https 站点 [英] Package &quot;rvest&quot; for web scraping https site with proxy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

包装“rvest"用于带有代理的网页抓取 https 站点 [英] Package "rvest" for web scraping https site with proxy

登录关闭