在 R 中对谷歌进行网页抓取时获取链接 [英] get links while do web scraping to google in R

查看：47 发布时间：2021/7/14 18:40:15 r web-scraping rvest

本文介绍了在 R 中对谷歌进行网页抓取时获取链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在搜索时获取 google 的链接，即所有这些链接:.

I am trying to get links of google while do a search, that is, all these links:.

我已经完成了这种抓取，但在这种情况下我不明白为什么它不起作用，所以我运行以下几行:

I have done this kind of scraping but in this case I do not understand why It doesn't work, so I run the following lines:

library(rvest)
url<-"https://www.google.es/search?q=Ediciones+Peña+sl+telefono"
content_request<-read_html(url)
content_request %>%
    html_nodes(".r") %>%
    html_attr("href")

我尝试过其他节点，得到了类似的答案:

I have tried with other nodes and I obtain similar answers:

content_request %>%
    html_nodes(".LC20lb") %>%
    html_attr("href")

最后我尝试获取网页的所有链接，但是有一些链接无法下载:

Finally I tried to get all the links of the web page, but there are some links that I cannot download:

html_attr(html_nodes(content_request, "a"), "href")

请问，你能帮我解决这个问题吗?谢谢.

Please, could you help me in this case? Thank you.

推荐答案

这里有两个选项供您尝试.

Here are two options for you to play around with.

#1) 

url <- "https://www.google.es/search?q=Ediciones+Pe%C3%B1a+sl+telefono"
html <- paste(readLines(url), collapse="\n")
library(stringr)
matched <- str_match_all(html, "<a href=\"(.*?)\"")


#2) 

library(xml2)
library(rvest)
URL <- "https://www.google.es/search?q=Ediciones+Pe%C3%B1a+sl+telefono"
pg <- read_html(URL)
head(html_attr(html_nodes(pg, "a"), "href"))

这篇关于在 R 中对谷歌进行网页抓取时获取链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 R 中对谷歌进行网页抓取时获取链接 [英] get links while do web scraping to google in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 R 中对谷歌进行网页抓取时获取链接 [英] get links while do web scraping to google in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭