写和读星火原料字节数组 - 使用序列文件SequenceFile [英] Write and read raw byte arrays in Spark - using Sequence File SequenceFile

查看：275 发布时间：2016/5/22 16:14:23 scala hadoop hdfs apache-spark sequencefile

本文介绍了写和读星火原料字节数组 - 使用序列文件SequenceFile的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

你怎么写 RDD [数组[字节] 来使用Apache星火文件并再次读它？

How do you write RDD[Array[Byte]] to a file using Apache Spark and read it back again?

推荐答案

常见的问题似乎得到一个奇怪的不能投例外，从BytesWritable到NullWritable。其他常见的问题是BytesWritable 的getBytes 是无稽之谈完全没有意义的一堆不都得到字节。什么的getBytes 做的是让你的字节比增加了吨结束零！你必须使用 copyBytes

Common problems seem to be getting a weird cannot cast exception from BytesWritable to NullWritable. Other common problem is BytesWritable getBytes is a totally pointless pile of nonsense which doesn't get bytes at all. What getBytes does is get your bytes than adds a ton of zeros on the end! You have to use copyBytes

val rdd: RDD[Array[Byte]] = ???

// To write
rdd.map(bytesArray => (NullWritable.get(), new BytesWritable(bytesArray)))
  .saveAsSequenceFile("/output/path", codecOpt)

// To read
val rdd: RDD[Array[Byte]] = sc.sequenceFile[NullWritable, BytesWritable]("/input/path")
  .map(_._2.copyBytes())

这篇关于写和读星火原料字节数组 - 使用序列文件SequenceFile的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

写和读星火原料字节数组 - 使用序列文件SequenceFile [英] Write and read raw byte arrays in Spark - using Sequence File SequenceFile

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

写和读星火原料字节数组 - 使用序列文件SequenceFile [英] Write and read raw byte arrays in Spark - using Sequence File SequenceFile

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭