如何正确使用rvest进行网络爬虫? [英] How to use rvest to web crawling correctly?

查看：30 发布时间：2021/7/14 18:41:42 r web-crawler rvest

本文介绍了如何正确使用rvest进行网络爬虫?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试通过网络抓取此页面 http://www.funda.nl/en/koop/leiden/ 以获得它可以显示的最大页面，即 29.我按照一些在线教程并找到 29 在 html 代码中的位置，编写了这个 R 代码

I try to web crawl this page http://www.funda.nl/en/koop/leiden/ to get the max page it could show which is 29. I followed some online tutorial and located where 29 is in the html code, wrote this R code

url<-  read_html("http://www.funda.nl/en/koop/leiden/")

url %>% html_nodes("#pagination-number.pagination-last") %>% html_attr("data-
pagination-page") %>% as.numeric()

然而，我得到的是numeric(0).如果我删除 as.numeric()，我会得到 character(0).

However, what I got is numeric(0). If I remove as.numeric(), I get character(0).

这是怎么做到的?

推荐答案

我一直在处理同样的问题，这对我有用:

I've been dealing with the same issue and this worked for me:

> url = "http://www.funda.nl/en/koop/leiden/"
> last_page <-
+   last(read_html(url) %>% 
+          html_nodes(css = ".pagination-pages") %>%
+          html_children()) %>% 
+   html_text(trim = T) %>% 
+   str_extract("[0-9]+") %>% 
+   as.numeric()
> last_page
[1] 23

这篇关于如何正确使用rvest进行网络爬虫?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何正确使用rvest进行网络爬虫? [英] How to use rvest to web crawling correctly?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何正确使用rvest进行网络爬虫? [英] How to use rvest to web crawling correctly?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭