rvest 返回空列表 [英] rvest returning empty list
问题描述
我试图通过复制 html 代码的 xpath 并使用 rvest 包来抓取它来从网站导入表格.我以前多次成功地做到了这一点,但是当我现在尝试时,我只是在生成一个空列表.为了诊断我的问题,我运行了以下代码(取自 https://www.r-bloggers.com/using-rvest-to-scrape-an-html-table/).但是,此代码也为我生成了一个空列表.
I am trying to import a table from a website by scraping it by copying the xpath of the html code and using the rvest package. I have done this successfully multiple times before, but when I am trying it now I am merely producing an empty list. In an attempt to diagnose my problem, I ran the following code (taken from https://www.r-bloggers.com/using-rvest-to-scrape-an-html-table/). However, this code is also producing an empty list for me.
预先感谢您的帮助!
library(rvest)
url <- "http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population"
population <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>%
html_table()
推荐答案
您的 xpath 查询错误.该表不是具有 mw-content-text id 的节点的直接子节点.虽然是后代.试试
Your xpath query is wrong. The table is not a direct child of the node with an id of mw-content-text. It is a descendant though. Try
html_nodes(xpath='//*[@id="mw-content-text"]//table[1]')
网页抓取是一项非常脆弱的工作,当网站更改其 HTML 时很容易中断.
Web scraping is a very fragile endeavor and can easily break when websites change their HTML.
这篇关于rvest 返回空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!