使用RVest库在iframe中抓取表格 [英] Scraping table within iframe using R rvest library
问题描述
我对R的rvest库抓取网站很满意,但是却在尝试一些新的东西.从此网页- http://www.naia.org/ViewArticle.dbml?ATCLID= 205323044 -我正在尝试刮擦大学的主要桌子.
I am decent with R's rvest library for scraping websites, but am struggling with something new. From this webpage - http://www.naia.org/ViewArticle.dbml?ATCLID=205323044 - I am trying to scrape the main table of colleges.
这是我的代码当前的样子:
Here is what my code looks like currently:
NAIA_url = "http://www.naia.org/ViewArticle.dbml?ATCLID=205323044"
NAIA_page = read_html(NAIA_url)
tables = html_table(html_nodes(NAIA_page, 'table'))
# tables returns a length-2 list, however neither of these tables are the table I desire.
# grab the correct iframe node
iframe = html_nodes(NAIA_page, "iframe")[3]
但是我正在努力克服这个问题. (1)由于某种原因,调用html_nodes不能获取我想要的表. (2),我不确定是否应该代替iframe,然后尝试从中获取表.
However I'm struggling past this. (1) for some reason calling html_nodes isn't grabbing the table I want. (2) and I'm not sure if I should instead grab the iframe and then try to grab the table from within it.
任何帮助表示赞赏!
推荐答案
如果嵌入式iframe是html,则可以获取iframe
源并从此处获取所需的表.
If the embedded iframe is html, you can grab the iframe
source and get your desired table from there.
library(rvest)
#> Loading required package: xml2
library(magrittr)
"http://www.naia.org/ViewArticle.dbml?ATCLID=205323044" %>%
read_html() %>%
html_nodes("iframe") %>%
extract(3) %>%
html_attr("src") %>%
read_html() %>%
html_node("#searchResultsTable") %>%
html_table() %>%
head()
#> College or University City, State
#> 1 Central Christian College ATHLETICS McPherson, KS
#> 2 + Crowley's Ridge College ATHLETICS Paragould, AR
#> 3 Edward Waters College ATHLETICS Jacksonville, Fl
#> 4 Fisher College ADMISSIONS | ATHLETICS Boston, MA
#> 5 Georgia Gwinnett College ADMISSIONS | ATHLETICS Lawrenceville, GA
#> 6 Lincoln Christian University ADMISSIONS | ATHLETICS Lincoln, IL
#> Conference Enrollment
#> 1 A.I.I. 259
#> 2 A.I.I. 0
#> 3 A.I.I. 805
#> 4 A.I.I. 600
#> 5 A.I.I. 9,720
#> 6 A.I.I. 1,060
这篇关于使用RVest库在iframe中抓取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!