R:从序列化对象中创建一个CSV [英] R: Creating a CSV out of serialized objects

查看:207
本文介绍了R:从序列化对象中创建一个CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试列出并序列化每个项目,并使用一个键将其放入CSV文件中,以创建具有键/值对的文本文件。最终,这将通过Hadoop流式传输,所以在你问之前,我认为它确实需要放在一个文本文件中。 (但我对其他想法持开放态度)这一切看起来似乎非常直截了当。但我不能完全按照我想要的方式进行序列化(仍然的)。



如果我这样做:

 > rawToChar(serialize(blah,NULL,ascii = T))
[1]A\\\
2\\\
133888\
131840\\\
16\\\
1\\\
9\\\
4\\\
blah\ n

然后我有那些烦人的\\\
,后来搞砸了我的CSV解析。我可以进去用其他一些字符串替换,我不反对这样做。这看起来有点乱。



另一个想到的选择是省略rawToChar()调用并将原始ascii抽成文本文件:

 > serialize(blah,NULL,ascii = T)
[1] 41 0a 32 0a 31 33 33 38 38 38 0a 31 33 31 38 34 30 0a 31 36 0a 31 0a 39 0a
[26 ] 34 0a 62 6c 61 68 0a

那么,如果我只是将其转储到文本文件,我会在列表中的每个元素后面都有\\\
。所以我试着做一些粘贴/折叠:

 > ser<  - 序列化(blah,NULL,ascii = T)
> ser2< - paste(ser,collapse =)
> ser2
[1]410a320a3133333838380a3133313834300a31360a310a390a340a626c61680a

现在,我可以写入CSV文本文件!只有......我以后怎么把它变回原状?让我们来看看第一个十六进制元素:41我甚至都不知道如何创建一个原始项目列表并将十六进制值41移到其中一个元素中。当我尝试将一个原始的十六进制值转换为一个原始列表时,我最终会得到如下结果:

 > r<  -  raw(1)
> r [1] < - 41
r [1]中的错误< - 41:子分配类型中的
不兼容类型(从double到raw)修复
> r [1]< - as.raw(41)
> r [1]
[1] 29

废话! 29!= 41(除非是29的真正大数值,当然是41的小数值)



关于如何破解这个螺母的任何想法? caTools 包含一个 //www.wikipedia.org/wiki/Base64rel =nofollow noreferrer> Base64 您可以使用的编码器 - 解码器:

 >库(caTools)
> s< -base64encode(serialize(blah,NULL))
> s
[1]WAoAAAACAAIKAQACAwAAAAAQAAAAAQAAAAkAAAAEYmxhaA ==
>反序列化(base64decode(s,raw))
[1]blah


I'm trying to take a list and serialize each item and put it into a CSV file with a key to create a text file with key/value pairs. Ultimately this is going to run through Hadoop streaming so before you ask, I think it really does need to be in a text file. (but I'm open to other ideas) This all seemed seemed pretty straight forward at first. But I can't quite get serialization to work the way I want it (still).

If I do this:

> rawToChar(serialize("blah", NULL, ascii=T))
[1] "A\n2\n133888\n131840\n16\n1\n9\n4\nblah\n"

Then I have those pesky \n which screw up my CSV parsing later. I could go in and replace the \n with some other string, which I'm not opposed to doing. This seems a little messy, however.

The other option that came to mind is omitting the rawToChar() call and pumping the raw ascii into a text file:

> serialize("blah", NULL, ascii=T)
 [1] 41 0a 32 0a 31 33 33 38 38 38 0a 31 33 31 38 34 30 0a 31 36 0a 31 0a 39 0a
[26] 34 0a 62 6c 61 68 0a

Well if I just dump that to a text file I'll get \n after each element in the list. So I tried doing a little paste/collapse:

> ser <- serialize("blah", NULL, ascii=T)
> ser2 <- paste(ser, collapse="")
> ser2
[1] "410a320a3133333838380a3133313834300a31360a310a390a340a626c61680a"

Now that's a value I can write to a CSV text file! Only... how do I turn that back into raw again later? Let's just take the first hex element: 41 I can't even figure out how to create a list of raw items and shove a hex value 41 into one of the elements. When I try to shove a raw hex value into a raw list I end up with something like this:

> r <- raw(1)
> r[1] <- 41
Error in r[1] <- 41 : 
  incompatible types (from double to raw) in subassignment type fix
> r[1] <- as.raw(41)
> r[1]
[1] 29 

Crap! 29!=41 (except for really large values of 29 and really small values of 41, of course)

Any ideas on how to crack this nut?

解决方案

The package caTools has a Base64 encoder-decoder that you can use:

> library(caTools)
> s<-base64encode(serialize("blah",NULL))
> s
[1] "WAoAAAACAAIKAQACAwAAAAAQAAAAAQAAAAkAAAAEYmxhaA=="
> unserialize(base64decode(s,"raw"))
[1] "blah"

这篇关于R:从序列化对象中创建一个CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆