R:序列化的base64 EN文本code /德code不完全匹配 [英] R: serialize base64 encode/decode of text not exactly matching

查看:135
本文介绍了R:序列化的base64 EN文本code /德code不完全匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的<一个href=\"http://stackoverflow.com/questions/3114043/r-creating-a-csv-out-of-serialized-objects\">$p$pvious问题关于使用连载()来创建对象的CSV我从jmoy,他建议我的连载文字base64编码一个伟大的答案。这正是我一直在寻找。奇怪的是,当我试图把这个在实践中我得到的结果看起来正确的,但不完全匹配是我通过序列化/编码过程跑去。

下面的例子需要用3个向量列表,每个序列化载体。然后每个向量是的base64烯codeD和用钥匙一起被写入到文本文件。关键是简单的矢量的索引号。然后我逆转这一过程并阅读每一行从CSV回来。在最后,你可以看到一些物品没有究竟的匹配。这是一个浮点问题?别的东西吗?

 要求(caTools)randList&LT;  -  NULL
set.seed(2)randList [[1]]&下; - RNORM(100)
randList [[2]]下; - RNORM(200)
randList [[3]]下; - RNORM(300)#DELETE文件内容
文件名&LT; - /tmp/tmp.txt
猫(,文件=文件名,追加= F)I&LT; - 1
对(在randList项){
  MYLINE&LT; - 膏(我,base64en code(连载(项目,NULL,ASCII = T)),\\ n,九月=)
  猫(MYLINE,文件=文件名,追加= T)
  I&LT; - I + 1
}linesIn&LT; - readlines方法(文件名中,n = -1)parsedThing&LT; - NULL
I&LT; - 1
对(在linesIn线){
  parsedThing [[I]]下; - 解序列化(base64de code(strsplit(linesIn [[Ⅰ],分割=,)[[1]] [[2]],原料))
  I&LT; - I + 1
  }#floating点问题?
相同(randList,parsedThing)为(i的1:长度(randList [[1]])){
  打印(randList [[1]] [I]] == parsedThing [[1]] [[I])
}I&LT; -3
randList [[1]] [I]] == parsedThing [[1]] [I]]randList [[1]] [I]]
parsedThing [[1]] [I]]

下面是删节输出:

 &GT; #floating点问题?
&GT;相同(randList,parsedThing)
[1] FALSE
&GT;
&GT;为(i的1:长度(randList [[1]])){
+打印(randList [[1]] [I]] == parsedThing [[1]] [[I])
+}
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
...
&GT;
&GT; I&LT; -3
&GT; randList [[1]] [I]] == parsedThing [[1]] [I]]
[1] FALSE
&GT;
&GT; randList [[1]] [I]]
[1] 1.587845
&GT; parsedThing [[1]] [I]]
[1] 1.587845
&GT;


解决方案

ASCII = T 在调用连载正在序列化和反序列化导致数值相差当R做IM precise二进制小数二进制转换。如果删除 ASCII = T 你得到完全相同的数字后面像现在它是被写出来的二进制再presentation。

base64en code 可连接code原料载体,因此不需要 ASCII = T

使用的二进制重新presentation连载是的架构无关,这样你就可以高高兴兴地在一台机器上的序列化和反序列化在另一个上。

参考: HTTP://cran.r- project.org/doc/manuals/R-ints.html#Serialization-Formats

in my previous question about using serialize() to create a CSV of objects I got a great answer from jmoy where he recommended base64 encoding of my serialized text. That was exactly what I was looking for. Oddly enough, when I try to put this in practice I get results that look right but don't exactly match what I ran through the serialize/encoding process.

The example below takes a list with 3 vectors and serializes each vector. Then each vector is base64 encoded and written to a text file along with a key. The key is simply the index number of the vector. I then reverse the process and read each line back from the csv. At the very end you can see some items don't exactly match. Is this a floating point issue? Something else?

require(caTools)

randList <- NULL
set.seed(2)

randList[[1]] <- rnorm(100)
randList[[2]] <- rnorm(200)
randList[[3]] <- rnorm(300)

#delete file contents
fileName <- "/tmp/tmp.txt"
cat("", file=fileName, append=F)

i <- 1
for (item in randList) {
  myLine <- paste(i, ",", base64encode(serialize(item, NULL, ascii=T)), "\n", sep="")
  cat(myLine, file=fileName, append=T) 
  i <- i+1
}

linesIn <- readLines(fileName, n=-1)

parsedThing <- NULL
i <- 1
for (line in linesIn){
  parsedThing[[i]] <- unserialize(base64decode(strsplit(linesIn[[i]], split=",")[[1]][[2]], "raw"))
  i <- i+1
  }

#floating point issue?
identical(randList, parsedThing)

for (i in 1:length(randList[[1]])) {
  print(randList[[1]][[i]] == parsedThing[[1]][[i]])
}

i<-3
randList[[1]][[i]] == parsedThing[[1]][[i]]

randList[[1]][[i]]
parsedThing[[1]][[i]]

Here's the abridged output:

> #floating point issue?
> identical(randList, parsedThing)
[1] FALSE
> 
> for (i in 1:length(randList[[1]])) {
+   print(randList[[1]][[i]] == parsedThing[[1]][[i]])
+ }
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
...
> 
> i<-3
> randList[[1]][[i]] == parsedThing[[1]][[i]]
[1] FALSE
> 
> randList[[1]][[i]]
[1] 1.587845
> parsedThing[[1]][[i]]
[1] 1.587845
> 

解决方案

ascii=T in your call to serialize is making R do imprecise binary-decimal-binary conversions when serializing and unserializing causing the values to differ. If you remove ascii=T you get exactly the same numbers back as now it is a binary representation which is written out.

base64encode can encode raw vectors so it doesn't need ascii=T.

The binary representation used by serialize is architecture independent, so you can happily serialize on one machine and unserialize on another.

Reference: http://cran.r-project.org/doc/manuals/R-ints.html#Serialization-Formats

这篇关于R:序列化的base64 EN文本code /德code不完全匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆