在R中使用React JS抓取网页 [英] Scraping webpage with react JS in R

查看：127 发布时间：2020/5/26 19:54:10 r web-scraping phantomjs rvest rselenium

本文介绍了在R中使用React JS抓取网页的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试抓取以下页面: https://metro.zakaz.ua/uk/?promotion=1
此页面包含反应内容.
我可以用代码抓取第一页:

I'm trying to scrape page below : https://metro.zakaz.ua/uk/?promotion=1
This page with react content.
I can scrape first page with code:

url="https://metro.zakaz.ua/uk/?promotion=1"

read_html(url)%>%
  html_nodes("script")%>%
  .[[8]] %>% 
  html_text()%>%
  fromJSON()%>%
  .$catalog%>%.$items%>%
  data.frame

结果是我拥有第一页中的所有项目，但我不知道如何抓取其他页面.
如果可以，此js代码将移至其他页面:

In result I have all items from first page, but I don't know how to scrape others pages.
This js code move to other page if that can help:

document.querySelectorAll('.catalog-pagination')[0].children[1].children[0].click()

感谢您的帮助！

推荐答案

您将需要'RSelenum'来执行无头导航.

You will need 'RSelenum' to perform headless navigation.

检查设置:如何为R设置硒? /a>

Check out for setting up: How to set up rselenium for R?

library(RSelenium)
library(rvest)
library(tidyvers)

url="https://metro.zakaz.ua/uk/?promotion=1"

rD <- rsDriver(port=4444L, browser="chrome")
remDr <- rD[['client']]

remDr$navigate(url)

### adjust items you want to scrape 
    src <- remDr$getPageSource()[[1]]

    pg <- read_html(src)
    tbl <- tibble(
                    product_name = pg %>% html_nodes(".product-card-name") %>% html_text(),
                    product_info = pg %>% html_nodes(".product-card-info") %>% html_text()
                    )

## to handle pagenation (tested with 5 pages) - adjust accordinly
for (i in 2:5) {
    pages <- remDr$findElement(using = 'css selector',str_c(".page:nth-child(",i,")"))

    pages$clickElement()  

    ## wait 5 sec to load
    Sys.sleep(5)

    src <- remDr$getPageSource()[[1]]

        pg <- read_html(src)
        data <- tibble(
                    product_name = pg %>% html_nodes(".product-card-name") %>% html_text(),
                    product_info = pg %>% html_nodes(".product-card-info") %>% html_text()
                    )
        tbl <- tbl %>% bind_rows(data)
}

nrow(tbl)
head(tbl)
tail(tbl)

这是一个快速的输出:

输出

这篇关于在R中使用React JS抓取网页的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中使用React JS抓取网页 [英] Scraping webpage with react JS in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中使用React JS抓取网页 [英] Scraping webpage with react JS in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭