在 R 中使用 rvest 抓取链接时出现空节点 [英] Empty nodes when scraping links with rvest in R

查看：35 发布时间：2021/7/14 18:35:24 r web-scraping rvest

本文介绍了在 R 中使用 rvest 抓取链接时出现空节点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目标是获得所有 Kaggle 挑战及其标题的链接.我正在使用库 rvest，但我似乎没有走远.当我进入几个 div 时，节点为空.

My goal is to get links to all challenges of Kaggle with their title. I am using the library rvest for it but I do not seem to come far. The nodes are empty when I am a few divs in.

我一开始试图为第一个挑战做这件事，并且应该能够将其转移到之后的每个条目中.第一个条目的 xpath 是:

I am trying to do it for the first challenge at first and should be able to transfer that to every entry afterwards. The xpath of the first entry is:

/html/body/div[1]/div[2]/div/div/div[2]/div/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/a

我的想法是一旦我在正确的标签中，就通过 html_attr( , "href") 获取链接.

My idea was to get the link via html_attr( , "href") once I am in the right tag.

我的想法是:

library(rvest)

url = "https://www.kaggle.com/competitions"
kaggle_html = read_html(url)
kaggle_text = html_text(kaggle_html)
kaggle_node <- html_nodes(kaggle_html, xpath = "/html/body/div[1]/div[2]/div/div/div[2]/div/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/a")
html_attr(kaggle_node, "href")

我无法通过某个 div.以下代码段显示了我可以访问的最后一个节点

I cant go past a certain div. The following snippet shows the last node I can access

node <- html_nodes(kaggle_html, xpath="/html/body/div[1]/div[2]/div")
html_attrs(node)

一旦我使用 html_nodes(kaggle_html,xpath="/html/body/div[1]/div[2]/div/div") 更进一步，节点将是空的.

Once I go one step further with html_nodes(kaggle_html,xpath="/html/body/div[1]/div[2]/div/div"), the node will be empty.

我认为问题在于 kaggle 使用了一个智能列表，当我向下滚动时，它会进一步扩展.

I think the issue is that kaggle uses a smart list that expands the further I scroll down.

(我知道我可以使用 %>%.我正在保存每一步，以便我能够更轻松地访问和查看它们，以便能够了解它是如何正常工作的.)

(I am aware that I can use %>%. I am saving every step so that I am able to access and view them more easily to be able to learn how it properly works.)

在 R 中使用 rvest 抓取链接时出现空节点 [英] Empty nodes when scraping links with rvest in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 R 中使用 rvest 抓取链接时出现空节点 [英] Empty nodes when scraping links with rvest in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭