用rvest抓取图像标题 [英] Scraping image titles with rvest

查看:78
本文介绍了用rvest抓取图像标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用R和SelectorGadget中的rvest包从Glassdoor中提取单个等级(API仅提供摘要等级).

I am trying to pull individual ratings from Glassdoor (the API only provides summary ratings) using the rvest package in R and SelectorGadget to identify my CSS selectors.

问题是Glassdoor使用图像传达评分,但数字评分包含在图像标题中.使用SelectorGadget,我可以从下面的代码片段(使用"#EmployerReviews undecorated li")中抓取"Comp& Benefits"文本,但在span ... title =部分中找不到"2.0",这就是我想要的.

The problem is Glassdoor uses images to convey the ratings, but the numeric rating is contained in the image title. Using SelectorGadget, I can scrape the "Comp & Benefits" text from the code snippet below (using "#EmployerReviews undecorated li"), but I can't get to the "2.0" in the span...title= section, which is what I want.

<div id='EmployerReviews'> .... <ul class='undecorated'> <li> <div class='minor'>Comp & Benefits</div> <span class='notranslate notranslate_title gdBars gdRatings med ' title="2.0"> 

过去有人成功抓取图像标题,还是知道获得这些个人评分的另一种方法?

Anyone had success scraping image titles in the past, or know of another way to get these individual ratings?

推荐答案

您将需要选择跨度,并使用html_attr()提取其属性值:

You will need to select the span, and use html_attr() to extract its attribute value:

html <- html("...")
rating <- html %>% 
  html_nodes("#EmployerReviews .undecorated li span.gdRatings") %>%
  html_attr("title")

rating
# [1] "2.0"

这篇关于用rvest抓取图像标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆