使用 rvest 抓取 - 当标签不存在时使用 NAs 完成 [英] Scraping with rvest - complete with NAs when tag is not present

查看：21 发布时间：2021/12/17 13:27:31 r tags web-scraping rvest

本文介绍了使用 rvest 抓取 - 当标签不存在时使用 NAs 完成的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想解析这个 HTML:并从中获取这个元素:

a) p 标签，带有 class: "normal_encontrado".
b) div with class: "price".

有时，某些产品中不存在 p 标签.如果是这种情况，应将 NA 添加到从该节点收集文本的向量中.

想法是有2个长度相同的向量，然后将它们连接起来形成一个data.frame.有什么想法吗?

HTML 部分:

<头></头><身体><div class="product_price" id="product_price_186251"><p class="normal_encontrado">S/.2,799.00</p><div id="WC_CatalogEntryDBThumbnailDisplayJSPF_10461_div_10" class="价格">S/.2,299.00

<div class="product_price" id="product_price_232046"><div id="WC_CatalogEntryDBThumbnailDisplayJSPF_10461_div_10" class="价格">S/.4,999.00

pacman::p_load("rvest", "dplyr")get_prices <- 函数(节点){r.precio.antes <- html_nodes(node, 'p.normal_encontrado') %>% html_textr.precio.actual <- html_nodes(node, 'div.price') %>% html_text数据框(precio.antes = ifelse(length(r.precio.antes)==0, NA, r.precio.antes),precio.actual = ifelse(length(r.precio.actual)==0, NA, r.precio.actual),字符串AsFactors = F)}doc <- read_html('test.html') %>% html_nodes("div.product_price")lapply(doc, get_prices) %>%rbind_all

<html> <head></head> <body> <div class="product_price" id="product_price_186251"> <p class="normal_encontrado"> S/. 2,799.00 </p> <div id="WC_CatalogEntryDBThumbnailDisplayJSPF_10461_div_10" class="price"> S/. 2,299.00 </div> </div> <div class="product_price" id="product_price_232046"> <div id="WC_CatalogEntryDBThumbnailDisplayJSPF_10461_div_10" class="price"> S/. 4,999.00 </div> </div> </body> </html>

library(rvest) page_source <- read_html("r.html") r.precio.antes <- page_source %>% html_nodes(".normal_encontrado") %>% html_text() r.precio.actual <- page_source %>% html_nodes(".price") %>% html_text()

pacman::p_load("rvest", "dplyr") get_prices <- function(node){ r.precio.antes <- html_nodes(node, 'p.normal_encontrado') %>% html_text r.precio.actual <- html_nodes(node, 'div.price') %>% html_text data.frame( precio.antes = ifelse(length(r.precio.antes)==0, NA, r.precio.antes), precio.actual = ifelse(length(r.precio.actual)==0, NA, r.precio.actual), stringsAsFactors=F ) } doc <- read_html('test.html') %>% html_nodes("div.product_price") lapply(doc, get_prices) %>% rbind_all

使用 rvest 抓取 - 当标签不存在时使用 NAs 完成 [英] Scraping with rvest - complete with NAs when tag is not present

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 rvest 抓取 - 当标签不存在时使用 NAs 完成 [英] Scraping with rvest - complete with NAs when tag is not present

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭