R:将对象序列化到文本文件并再次返回 [英] R: serialize objects to text file and back again

查看:66
本文介绍了R:将对象序列化到文本文件并再次返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有一个进程,它创建一堆对象,将它们序列化,然后将它们放入纯文本文件中.这似乎是一种非常好的处理方式,因为我正在使用 Hadoop 并且所有输出都需要通过 stdin 和 stdout 进行流式传输.

I have a process in R that creates a bunch of objects, serializes them, and puts them into plain text files. This seemed like a really good way to handle things since I am working with Hadoop and all output needs to stream through stdin and stdout.

我剩下的问题是如何从文本文件中读取这些对象并返回到台式机上的 R 中.这是一个说明挑战的工作示例:

The problem I am left with is how to read these objects out of the text file and back into R on my desktop machine. Here's a working example that illustrates the challenge:

让我们创建一个 tmp 文件并将单个对象写入其中.这个对象只是一个向量:

Let's create a tmp file and write a single object into it. This object is just a vector:

outCon <- file("c:/tmp", "w")
mychars <- rawToChar(serialize(1:10, NULL, ascii=T))
cat(mychars, file=outCon)
close(outCon)

mychars 对象如下所示:

The mychars object looks like this:

> mychars
[1] "A\n2\n133633\n131840\n13\n10\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"

当写入文本文件时,它看起来像这样:

when written to the text file it looks like this:

A
2
133633
131840
13
10
1
2
3
4
5
6
7
8
9
10

我可能忽略了一些非常明显的东西,但是我如何将此文件读回 R 并反序列化对象?当我尝试 scan() 或 readLines() 时,都想将换行符视为记录分隔符,最终得到一个向量,其中每个元素都是文本文件中的一行.我真正想要的是包含文件全部内容的文本字符串.然后我可以反序列化字符串.

I'm probably overlooking something terribly obvious, but how do I read this file back into R and unserialize the object? When I try scan() or readLines() both want to treat the new line characters as record delimiters and I end up with a vector where each element is a row from the text file. What I really want is a text string with the whole contents of the file. Then I can unserialize the string.

Perl 会将换行符读回字符串,但我不知道如何覆盖 R 处理换行符的方式.

Perl will read line breaks back into a string, but I can't figure out how to override the way R treats line breaks.

推荐答案

JD,我们在 digest 包通过 serialize() 到/从 raw.这很好,因为您可以在 SQL 和其他地方存储序列化对象.我实际上也将其存储为 RData,这比 load()(无需解析!)和 save() 更快.​​

JD, we do that in the digest package via serialize() to/from raw. That is nice as you can store serialized objects in SQL and other places. I would actually store this as RData as well which is way quicker to load() (no parsing!) and save().

或者,如果它必须是 RawToChar() 和 ascii 然后使用这样的东西(直接取自 help(digest),我们比较文件 COPYING 的序列化:

Or, if it has to be RawToChar() and ascii then use something like this (taken straight from help(digest) where we compare serialization of the file COPYING:

 # test 'length' parameter and file input
 fname <- file.path(R.home(),"COPYING")
 x <- readChar(fname, file.info(fname)$size) # read file
 for (alg in c("sha1", "md5", "crc32")) {
   # partial file
   h1 <- digest(x    , length=18000, algo=alg, serialize=FALSE)
   h2 <- digest(fname, length=18000, algo=alg, serialize=FALSE, file=TRUE)
   h3 <- digest( substr(x,1,18000) , algo=alg, serialize=FALSE)
   stopifnot( identical(h1,h2), identical(h1,h3) )
   # whole file
   h1 <- digest(x    , algo=alg, serialize=FALSE)
   h2 <- digest(fname, algo=alg, serialize=FALSE, file=TRUE)
   stopifnot( identical(h1,h2) )
 }

所以你的例子变成了这样:

so with that your example becomes this:

R> outCon <- file("/tmp/jd.txt", "w")
R> mychars <- rawToChar(serialize(1:10, NULL, ascii=T))
R> cat(mychars, file=outCon); close(outCon)
R> fname <- "/tmp/jd.txt"
R> readChar(fname, file.info(fname)$size)
[1] "A\n2\n133633\n131840\n13\n10\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"
R> unserialize(charToRaw(readChar(fname, file.info(fname)$size)))
[1]  1  2  3  4  5  6  7  8  9 10
R> 

这篇关于R:将对象序列化到文本文件并再次返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆