尝试使用 rvest 循环命令从多个页面中抓取表格 [英] Trying to use rvest to loop a command to scrape tables from multiple pages
问题描述
我正在尝试从不同的足球队中抓取 HTML 表格.这是我想要抓取的表格,但是我想从所有团队中抓取同一个表格,以最终创建一个包含球员姓名及其数据的 CSV 文件.
I'm trying to scrape HTML tables from different football teams. Here is the table I want to scrape, however I want to scrape that same table from all of the teams to ultimately create a single CSV file that has the player names and their data.
http://www.pro-football-reference.com/teams/tam/2016_draft.htm
# teams
teams <- c("ATL", "TAM", "NOR", "CAR", "GNB", "DET", "MIN", "CHI", "SEA", "CRD", "RAM", "NWE", "MIA", "BUF", "NYJ", "KAN", "RAI", "DEN", "SDG", "PIT", "RAV", "SFO", "CIN", "CLE", "HTX", "OTI", "CLT", "JAX", "DAL", "NYG", "WAS", "PHI")
# loop
for(i in teams) {
url <-paste0("http://www.pro-football-reference.com/teams/", i,"/2016-snap-counts.htm#snap_counts::none", sep="")
webpage <- read_html(url)
# grab table
sb_table <- html_nodes(webpage, 'table')
html_table(sb_table)
head(sb_table)
# bind to dataframe
df <- rbind(df, sb_table)
}
我收到一个错误,认为我应该使用 CSS 或 Xpath 而不是两者,但我无法弄清楚问题究竟出在哪里(我怀疑是 html_nodes 命令).谁能帮我解决这个问题?
I'm getting an error thought that I should use CSS or Xpath and not both, but I can't figure out where the problem is exactly (I suspect the html_nodes command). Can anyone help me fix this problem?
推荐答案
我认为您的 url 构建得很糟糕,此外,团队的名称区分大小写.你可以试试这样的吗?
I think that your urls are badly built and, in addition, that the names of the teams are case sensitive. Could you try something like this instead ?
library(rvest)
library(magrittr)
# teams
teams <- c("ATL", "TAM", "NOR", "CAR", "GNB", "DET", "MIN", "CHI", "SEA", "CRD", "RAM", "NWE", "MIA", "BUF", "NYJ", "KAN", "RAI", "DEN", "SDG", "PIT", "RAV", "SFO", "CIN", "CLE", "HTX", "OTI", "CLT", "JAX", "DAL", "NYG", "WAS", "PHI")
tables <- list()
index <- 1
for(i in teams){
try({
url <- paste0("http://www.pro-football-reference.com/teams/", tolower(i), "/2016_draft.htm")
table <- url %>%
read_html() %>%
html_table(fill = TRUE)
tables[index] <- table
index <- index + 1
})
}
df <- do.call("rbind", tables)
PS:我不明白为什么这个问题被否决了.似乎制定得很好...
PS: I do not understand why this question is downvoted. It seems well formulated ...
这篇关于尝试使用 rvest 循环命令从多个页面中抓取表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!