R readHTMLTable()函数错误 [英] R readHTMLTable() function error

查看:126
本文介绍了R readHTMLTable()函数错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在R包XML中使用readHTMLTable函数时遇到问题.跑步时

I'm running into a problem when trying to use the readHTMLTable function in the R package XML. When running

library(XML)
baseurl <- "http://www.pro-football-reference.com/teams/"
team <- "nwe"
year <- 2011
theurl <- paste(baseurl,team,"/",year,".htm",sep="")

readurl <- getURL(theurl)
readtable <- readHTMLTable(readurl)

我收到错误消息:

Error in names(ans) = header : 
'names' attribute [27] must be the same length as the vector [21]

我正在通过R Studio 0.96.330运行64位R 2.15.1.似乎还有其他关于readHTMLTable()函数的问题,但没有一个解决此特定问题.有人知道发生了什么事吗?

I'm running 64 bit R 2.15.1 through R Studio 0.96.330. It seems there are several other questions that have been asked about the readHTMLTable() function, but none addressed this specific question. Does anyone know what's going on?

推荐答案

readHTMLTable()抱怨'names'属性时,最好将其与将其数据与标头值解析的内容进行匹配时遇到麻烦.解决此问题的最简单方法是完全关闭标头解析:

When readHTMLTable() complains about the 'names' attribute, it's a good bet that it's having trouble matching the data with what it's parsed for header values. The simplest way around this is to simply turn off header parsing entirely:

table.list <- readHTMLTable(theurl, header=F)

请注意,我将返回值的名称从"readtable"更改为"table.list". (我也跳过了getURL()调用,因为1.它不适用于我,并且2. readHTMLTable()知道如何处理URL).进行更改的原因是,没有进一步的指导,readHTMLTable()将搜寻并解析在给定页面上可以找到的每个HTML表,并为每个表返回一个包含data.frame的列表.

Note that I changed the name of the return value from "readtable" to "table.list". (I also skipped the getURL() call since 1. it didn't work for me and 2. readHTMLTable() knows how to handle URLs). The reason for the change is that, without further direction, readHTMLTable() will hunt down and parse every HTML table it can find on the given page, returning a list containing a data.frame for each.

您发送给它的页面相当丰富,有8个单独的表:

The page you have sent it after is fairly rich, with 8 separate tables:

> length(table.list)
[1] 8

如果只对页面上的单个表感兴趣,则可以使用which属性指定它并直接将其内容作为data.frame接收.

If you were only interested in a single table on the page, you can use the which attribute to specify it and receive its contents as a data.frame directly.

如果它在您不感兴趣的表上被cho住了,这也可以解决您的原始问题.许多页面仍然使用表进行导航,搜索框等,因此值得首先看一下页面.

This could also cure your original problem if it had choked on a table you're not interested in. Many pages still use tables for navigation, search boxes, etc., so it's worth taking a look at the page first.

但是在您的示例中,情况不太可能发生,因为它实际上阻塞了除其中一个以外的所有总线.万一星星对齐,并且您只对页面上成功输完的第三张表(传递统计信息)感兴趣,您可以像这样抓住它,并在标题上进行以下解析:

But this is unlikely to be the case in your example since it actually choked on all but one of them. In the unlikely event that the stars aligned and you were only interested in the successfully-oarsed third table on the page (passing statistics) you could grab it like this, keeping header parsing on:

> passing.df = readHTMLTable(theurl, which=3)
> print(passing.df)
  No.             Age Pos  G GS  QBrec Cmp Att  Cmp%  Yds TD TD% Int Int% Lng  Y/A AY/A  Y/C   Y/G  Rate Sk Yds NY/A  ANY/A Sk% 4QC GWD
1  12  Tom Brady*  34  QB 16 16 13-3-0 401 611  65.6 5235 39 6.4  12  2.0  99  8.6  9.0 13.1 327.2 105.6 32 173  7.9   8.2 5.0   2   3
2   8 Brian Hoyer  26      3  0          1   1 100.0   22  0 0.0   0  0.0  22 22.0 22.0 22.0   7.3 118.7  0   0 22.0  22.0 0.0

这篇关于R readHTMLTable()函数错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆