在Spark Scala中读取二进制文件 [英] Reading binary file in Spark Scala

查看:98
本文介绍了在Spark Scala中读取二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从二进制文件中提取数据.

I need to extract data from a binary file.

我使用了 binaryRecords 并获得了 RDD [Array [Byte]] .

从这里我想将每条记录解析为案例类(Field1:整数,Filed2:短,Field3:长)

From here I want to parse every record into case class (Field1: Int, Filed2 : Short, Field3: Long)

我该怎么做?

推荐答案

假设您没有定界符,Scala中的Int为4字节,Short为2字节,long为8字节.假设您的二进制数据的结构(每行)为Int Short Long.您应该能够获取字节并将其转换为所需的类.

assuming you have no delimiter, an Int in Scala is 4 bytes, Short is 2 byte and long is 8 bytes. Assume that your Binary data was structured (for each line) as Int Short Long. You should be able to take the bytes and convert them to the classes you want.

import java.nio.ByteBuffer

val result = YourRDD.map(x=>(ByteBuffer.wrap(x.take(4)).getInt,
             ByteBuffer.wrap(x.drop(4).take(2)).getShort,
             ByteBuffer.wrap(x.drop(6)).getLong))

这使用Java库将Bytes转换为Int/Short/Long,如果需要,您可以使用其他库.

This uses a Java library to convert Bytes to Int/Short/Long, you can use other libraries if you want.

这篇关于在Spark Scala中读取二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆