如何在R中使用readHTMLTable读取注释掉的HTML表 [英] How to read a commented out HTML table using readHTMLTable in R
问题描述
过去,我已经能够在R中使用readHTMLTable来获取一些足球统计数据.当今年再次尝试这样做时,即使表格在网页上可见,也不会显示表格.这是一个示例: http://www.pro-football-reference.com/boxscores/201609080den.htm
In the past, I have been able to use readHTMLTable in R to pull some football stats. When trying to do so again this year, the tables aren't showing up, even though they are visible on the webpage. Here is an example: http://www.pro-football-reference.com/boxscores/201609080den.htm
当我查看页面的源代码时,所有表都被注释掉了(我怀疑这是为什么readHTMLTable找不到它们的原因).
When I view the source for the page, the tables are all commented out (which I suspect is why readHTMLTable didn't find them).
示例:在源代码中搜索"team_stats" ...
Example: search for "team_stats" in source code...
<!--
<div class="table_outer_container">
<div class="overthrow table_container" id="div_team_stats">
<table class="stats_table" id="team_stats" data-cols-to- freeze=1><caption>Team Stats Table</caption>
问题:
该表如何在源中被注释掉而又在浏览器中显示?
How can the table be commented out in the source yet display in the browser?
是否可以使用readHTMLTable(或其他方法)读取注释掉的表?
Is there a way to read the commented out tables using readHTMLTable (or some other method)?
推荐答案
实际上,如果您使用XPath comment()
选择器,则可以抓住它:
You can, in fact, grab it if you use the XPath comment()
selector:
library(rvest)
url <- 'http://www.pro-football-reference.com/boxscores/201609080den.htm'
url %>% read_html() %>% # parse html
html_nodes('#all_team_stats') %>% # select node with comment
html_nodes(xpath = 'comment()') %>% # select comments within node
html_text() %>% # return contents as text
read_html() %>% # parse text as html
html_node('table') %>% # select table node
html_table() # parse table and return data.frame
## CAR DEN
## 1 First Downs 21 21
## 2 Rush-Yds-TDs 32-157-1 29-148-2
## 3 Cmp-Att-Yd-TD-INT 18-33-194-1-1 18-26-178-1-2
## 4 Sacked-Yards 3-18 2-19
## 5 Net Pass Yards 176 159
## 6 Total Yards 333 307
## 7 Fumbles-Lost 0-0 1-1
## 8 Turnovers 1 3
## 9 Penalties-Yards 8-85 4-22
## 10 Third Down Conv. 9-15 5-10
## 11 Fourth Down Conv. 0-0 1-1
## 12 Time of Possession 32:19 27:41
这篇关于如何在R中使用readHTMLTable读取注释掉的HTML表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!