如何将read_html的输出保存和读取为RDS文件? [英] How to save and read output of read_html as an RDS file?

查看:237
本文介绍了如何将read_html的输出保存和读取为RDS文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以像这样保存和读取对象

Objects can be saved and read like so

# Save as file
saveRDS(iris, "mydata.RDS")

# Read back in 
readRDS("mydata.RDS")

但这似乎不适用于使用xml2::read_html()

But this doesn't seem to work for objects made with xml2::read_html()

library(rvest)
someobject <- read_html("https://stackoverflow.com/")
saveRDS(someobject, "someobject.RDS")

哪个创建了文件,但没有达到预期 即

Which creates a file, but not as expected i.e.

readRDS("someobject.RDS")
Error in doc_is_html(x$doc) : external pointer is not valid

这是怎么回事,最简单的保存html对象的方法是什么,以便可以用最少的代码/繁琐的操作将其重新加载?

What's going on and what's the simplest way of saving an html object so that it can be loaded back in with minimal code/fuss?

推荐答案

我们可以使用xml2包中的write_xmlread_html

We can use write_xml and read_html from xml2 package

before <- read_html("https://stackoverflow.com/")
xml2::write_xml(before, "someobject1.xml")
after <- xml2::read_html("someobject1.xml")

但是,identical返回FALSE

identical(before, after)
#[1] FALSE

但是对它们两个的查询似乎都返回相同的结果

but the query on both of them seem to return the same result

library(rvest)
before %>%  html_nodes("div")
after %>% html_nodes("div")

这篇关于如何将read_html的输出保存和读取为RDS文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆