用rvest抓取图像标题 [英] Scraping image titles with rvest

查看：78 发布时间：2020/8/10 19:27:01 r css-selectors rvest

本文介绍了用rvest抓取图像标题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用R和SelectorGadget中的rvest包从Glassdoor中提取单个等级(API仅提供摘要等级).

I am trying to pull individual ratings from Glassdoor (the API only provides summary ratings) using the rvest package in R and SelectorGadget to identify my CSS selectors.

问题是Glassdoor使用图像传达评分，但数字评分包含在图像标题中.使用SelectorGadget，我可以从下面的代码片段(使用"#EmployerReviews undecorated li")中抓取"Comp& Benefits"文本，但在span ... title =部分中找不到"2.0"，这就是我想要的.

The problem is Glassdoor uses images to convey the ratings, but the numeric rating is contained in the image title. Using SelectorGadget, I can scrape the "Comp & Benefits" text from the code snippet below (using "#EmployerReviews undecorated li"), but I can't get to the "2.0" in the span...title= section, which is what I want.

<div id='EmployerReviews'> .... <ul class='undecorated'> <li> <div class='minor'>Comp & Benefits</div> <span class='notranslate notranslate_title gdBars gdRatings med ' title="2.0">

过去有人成功抓取图像标题，还是知道获得这些个人评分的另一种方法?

Anyone had success scraping image titles in the past, or know of another way to get these individual ratings?

推荐答案

您将需要选择跨度，并使用html_attr()提取其属性值:

You will need to select the span, and use html_attr() to extract its attribute value:

html <- html("...")
rating <- html %>% 
  html_nodes("#EmployerReviews .undecorated li span.gdRatings") %>%
  html_attr("title")

rating
# [1] "2.0"

这篇关于用rvest抓取图像标题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用rvest抓取图像标题 [英] Scraping image titles with rvest

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用rvest抓取图像标题 [英] Scraping image titles with rvest

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭