rvest 返回空列表 [英] rvest returning empty list

查看:43
本文介绍了rvest 返回空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过复制 html 代码的 xpath 并使用 rvest 包来抓取它来从网站导入表格.我以前多次成功地做到了这一点,但是当我现在尝试时,我只是在生成一个空列表.为了诊断我的问题,我运行了以下代码(取自 https://www.r-bloggers.com/using-rvest-to-scrape-an-html-table/).但是,此代码也为我生成了一个空列表.

I am trying to import a table from a website by scraping it by copying the xpath of the html code and using the rvest package. I have done this successfully multiple times before, but when I am trying it now I am merely producing an empty list. In an attempt to diagnose my problem, I ran the following code (taken from https://www.r-bloggers.com/using-rvest-to-scrape-an-html-table/). However, this code is also producing an empty list for me.

预先感谢您的帮助!

library(rvest)
url <- "http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
population <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>%
  html_table()

推荐答案

您的 xpath 查询错误.该表不是具有 mw-content-text id 的节点的直接子节点.虽然是后代.试试

Your xpath query is wrong. The table is not a direct child of the node with an id of mw-content-text. It is a descendant though. Try

html_nodes(xpath='//*[@id="mw-content-text"]//table[1]') 

网页抓取是一项非常脆弱的工作,当网站更改其 HTML 时很容易中断.

Web scraping is a very fragile endeavor and can easily break when websites change their HTML.

这篇关于rvest 返回空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆