提取href attr或将节点转换为字符列表 [英] Extracting href attr or converting node to character list

查看：29 发布时间：2021/7/14 18:40:06 html r rvest

本文介绍了提取href attr或将节点转换为字符列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试从网站中提取一些信息

I try to extract some information from the website

library(rvest)
library(XML)
url <- "http://wiadomosci.onet.pl/wybory-prezydenckie/xcnpc"
html <- html(url)

nodes <- html_nodes(html, ".listItemSolr") 
nodes

我得到了 30 个 HTML 代码部分的列表".我想从列表"的每个元素中提取最后一个 href 属性，因此对于 30. 元素，它将是

I get "list" of 30 parts of HTML code. I want from each element of the "list" extract last href attribute, so for the 30. element it would be

<a href="http://wiadomosci.onet.pl/kraj/w-sobote-prezentacja-hasla-i-programu-wyborczego-komorowskiego/tvgcq" title="W sobotę prezentacja hasła i programu wyborczego Komorowskiego">

所以我想得到字符串

"http://wiadomosci.onet.pl/kraj/w-sobote-prezentacja-hasla-i-programu-wyborczego-komorowskiego/tvgcq"

问题是 html_attr(nodes, "href") 不起作用(我得到了 NA 的向量).所以我想到了正则表达式，但问题是 nodes 不是字符列表.

The problem is html_attr(nodes, "href") doesn't work (I get vector of NA's). So I thought about regex but the problem is that nodes isn't the character list.

class(nodes)
[1] "XMLNodeSet"

我试过了

xmlToList(nodes)

但它也不起作用.

所以我的问题是:如何使用为 HTML 创建的某些函数提取此 url?或者，如果不可能，我怎样才能将 XMLNodeSet 转换为字符列表?

So my question is: how can I extract this url with some function created for HTML? Or, if it is not possible how can I get convert XMLNodeSet to character list?

更新

Hadley 建议使用优雅的管道:

Update

Hadley suggested using elegant pipes:

html %>%  
  html_nodes(".listItemSolr") %>% 
  html_nodes(xpath = "./a") %>% 
  html_attr("href")

这篇关于提取href attr或将节点转换为字符列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

提取href attr或将节点转换为字符列表 [英] Extracting href attr or converting node to character list

问题描述

推荐答案

更新

Update

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

提取href attr或将节点转换为字符列表 [英] Extracting href attr or converting node to character list

问题描述

推荐答案

更新

Update

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭