R中的Web Scraping与来自data.frame的循环 [英] Web Scraping in R with loop from data.frame
本文介绍了R中的Web Scraping与来自data.frame的循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
library(rvest)
df <- data.frame(Links = c("Qmobile_Noir-M6", "Qmobile_Noir-A1", "Qmobile_Noir-E8"))
for(i in 1:3) {
webpage <- read_html(paste0("https://www.whatmobile.com.pk/", df$Links[i]))
data <- webpage %>%
html_nodes(".specs") %>%
.[[1]] %>%
html_table(fill = TRUE)
}
想要使循环对于df$Links
中的所有3个值均有效,但上述代码仅下载了最后一个,下载的数据还必须与变量相同(可能是带有变量名称的新列)
want to make loop works for all 3 values in df$Links
but above code just download the last one, and downloaded data must also be identical with variables (may be a new column with variables name)
推荐答案
问题出在如何构造for
循环.但是,首先不用一个就容易得多,因为R对遍历列表(例如lapply
和purrr::map
)提供了强大的支持.一种如何构造数据的版本:
The problem is in how you're structuring your for
loop. It's much easier just to not use one in the first place, though, as R has great support for iterating over lists, like lapply
and purrr::map
. One version of how you could structure your data:
library(tidyverse)
library(rvest)
base_url <- "https://www.whatmobile.com.pk/"
models <- data_frame(model = c("Qmobile_Noir-M6", "Qmobile_Noir-A1", "Qmobile_Noir-E8"),
link = paste0(base_url, model),
page = map(link, read_html))
model_specs <- models %>%
mutate(node = map(page, html_node, '.specs'),
specs = map(node, html_table, header = TRUE, fill = TRUE),
specs = map(specs, set_names, c('var1', 'var2', 'val1', 'val2'))) %>%
select(model, specs) %>%
unnest()
model_specs
#> # A tibble: 119 x 5
#> model var1 var2
#> <chr> <chr> <chr>
#> 1 Qmobile_Noir-M6 Build OS
#> 2 Qmobile_Noir-M6 Build Dimensions
#> 3 Qmobile_Noir-M6 Build Weight
#> 4 Qmobile_Noir-M6 Build SIM
#> 5 Qmobile_Noir-M6 Build Colors
#> 6 Qmobile_Noir-M6 Frequency 2G Band
#> 7 Qmobile_Noir-M6 Frequency 3G Band
#> 8 Qmobile_Noir-M6 Frequency 4G Band
#> 9 Qmobile_Noir-M6 Processor CPU
#> 10 Qmobile_Noir-M6 Processor Chipset
#> # ... with 109 more rows, and 2 more variables: val1 <chr>, val2 <chr>
数据仍然很混乱,但至少全部都在这里.
The data is still pretty messy, but at least it's all there.
这篇关于R中的Web Scraping与来自data.frame的循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文