Rvest 返回空值 [英] Rvest returning null values

查看:41
本文介绍了Rvest 返回空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试拼凑 rvest 的使用方式,我以为我得到了它,但我收到的所有结果都是空的.

I am trying to piece together how rvest is used, and I thought I'd got it but all the results I receive are null.

我正在使用 @RonakShah 的示例(使用 rvest 循环)作为我的基本示例,我想我会尝试扩展为收集姓名、电话和每天开放的时间:

I am using @RonakShah 's example (Loop with rvest) as my base example and thought I'd try and expand to instead collect the name, telephone and hours open each day:

site = "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
name <- webpage %>% html_nodes('p.name') %>%html_text() %>% trimws()
  telephone <- webpage %>% html_nodes('p.telephone') %>%html_text() %>% trimws()
  monday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  tuesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  wednesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  thursday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  friday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  saturday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  sunday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  data.frame(telephone, monday, tuesday, wednesday, thursday, friday, saturday, sunday)
}

get_phone(site)

但我无法让其中任何一个单独工作?我什至无法读取当天或错误的电话号码.有人能帮忙指出原因吗?

But I can't get any of these to work individually? I can't even get it to read the day in or the incorrect phone number. Would someone help point out why?

推荐答案

右键点击网页,选择Inspect,查看网页的HMTL.找到您要提取的元素并使用 CSS 选择器来抓取它.

Right click on the webpage, select Inspect and check the HMTL of the webpage. Find the element that you want to extract and use CSS selectors to scrape it.

library(rvest)
site <- "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
  phone <- webpage %>% html_nodes('span[itemprop="telephone"]') %>% html_text()
  opening_hours <- webpage %>% 
                    html_nodes('div.open-hours') %>% 
                    html_attr('data-times') %>% jsonlite::fromJSON()
  list(phone_number = phone, opening_hours = opening_hours)
}

get_phone(site)


#$phone_number
#[1] "+64 800 888 386"

#$opening_hours
#  weekday time_from time_to
#1       1     12:00   00:00
#2       2     12:00   00:00
#3       3     12:00   00:00
#4       4     12:00   00:00
#5       5     12:00   00:00
#6       6     10:00   00:00
#7       0     10:00   00:00

营业时间存储在一个 json 文件中,这很有帮助,因此我们不必单独抓取它们并将它们绑定在一起.

Opening hours are stored in a json file which is helpful so we don't have to individually scrape them and bind them together.

这篇关于Rvest 返回空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆