在R中进行Web抓取,"...在当前工作目录中不存在"错误 [英] Webscraping in R, "... does not exist in current working directory" error

查看:118
本文介绍了在R中进行Web抓取,"...在当前工作目录中不存在"错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用XML2包从ESPN.com抓取一些表.举个例子,我想将第7周的幻想四分卫排名拖入R,URL为:

I'm trying to use the XML2 package to scrape a few tables from ESPN.com. For the sake of example, I'd like to scrape the week 7 fantasy quarterback rankings into R, the URL to which is:

http://www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-quarterback-rankings

我正在尝试使用"read_html()"函数执行此操作,因为这是我最熟悉的功能.这是我的语法及其错误:

I'm trying to use the "read_html()" function to do this because it is what I am most familiar with. Here is my syntax and its error:

> wk.7.qb.rk = read_html("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').

我也尝试过"read_xml()",只是得到了相同的错误:

I've also tried "read_xml()", only to get the same error:

> wk.7.qb.rk = read_xml("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').

R为什么在工作目录中查找此URL?我已经尝试过使用其他URL的此功能,并取得了一些成功.这个特定的网址有什么用,它与其他网址的位置不同?而且,我该如何更改呢?

Why is R looking for this URL in the working directory? I've tried this function with other URLs and had some success. What is it about this specific URL that makes it look in a different location than it does for others? And, how do I change that?

推荐答案

我在循环运行read_html来浏览20页时遇到此错误.在第20页之后,循环仍然没有网址运行,因此它开始使用NA调用read_html来进行其他循环迭代.希望这会有所帮助!

I got this error while I was running my read_html in a loop to navigate through 20 pages. After the 20th page the loop was still running with no urls and hence it started calling read_html with NAs for the other loop iterations.Hope this helps!

这篇关于在R中进行Web抓取,"...在当前工作目录中不存在"错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆