解析一个小端的二进制文件,填充到矩阵中 [英] Parse a little-endian binary file, stuffing into a matrix

查看:208
本文介绍了解析一个小端的二进制文件,填充到矩阵中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二进制文件包含一个X乘X矩阵。文件本身是一系列单精度浮点数(小尾数法)。我想做的是解析它,并将其填充到一些合理的clojure矩阵数据类型。



感谢这个问题,我看到我可以解析一个带光泽的二进制文件。我现在有这样的代码:

 (ns foo.core 
(:require gloss.core)
(:require gloss.io)
(:use [clojure.java.io])
(:use [clojure.math.numeric-tower]))
$ b b(gloss.core / defcodec mycodec
(gloss.core / repeated:float32:prefix:none))

(def buffer(byte-array(* 1200 1200)))

(.read(input-stream/ path / to / binaryfile)buffer)

(gloss.io/decode mycodec buffer)

这需要一段时间才能运行,但最终会丢弃大量的数字。不幸的是,数字都错了。经进一步调查,数字被读为大端。



假设有一些方法来读这些二进制文件为little-endian,我想填充结果转换为矩阵。 这个问题似乎已经决定使用Incanter与其平行Colt表示,但是,这个问题是从' 09,我希望坚持clojure 1.4和lein 2.在我的疯狂的谷歌搜索,我看到其他建议使用jblas或mahout。这些天有什么最好的矩阵库clojure吗?



编辑:读取二进制文件是非常接近。感谢这个方便的 nio wrapper ,我可以获得一个内存映射字节缓冲区作为一个简短的单线程,甚至重新排序:

 (ns foo.core 
(:require [clojure.java.io:as io])
(:require [nio.core:as nio])
(:import [java.nio ByteOrder]))

(def buffer(nio / mmap / path / to / binaryfile))

(class buffer);; java.nio.DirectByteBuffer

(.order buffer java.nio.ByteOrder / LITTLE_ENDIAN)
;; #< DirectByteBuffer java.nio.DirectByteBuffer [pos = 0 lim = 5760000 cap = 5760000]>但是,无需执行中间(def)步骤,重新排序失败:


 (。order(nio / mmap f)java.nio.ByteOrder / LITTLE_ENDIAN)

;; clojure.lang.Compiler $ CompilerException:java.lang.IllegalArgumentException:无法解析类名:MappedByteBuffer,compiling:(/ Users / peter / Developer / foo / src / foo / core.clj:12)
;; at clojure.lang.Compiler.analyzeSeq(Compiler.java:6462)
;; clojure.lang.Compiler.analyze(Compiler.java:6262)
;; etc ...

我想在一个函数内创建重新排序的字节缓冲区没有定义一个全局变量,但现在似乎不喜欢这样。



此外,一旦我重新排序,我不完全确定如何处理我的DirectByteBuffer,因为它似乎不可迭代。也许对于读取这个缓冲区对象(进入JBLAS矩阵)的剩余步骤,我将创建第二个问题。



编辑2:我将下面的答案标记为接受,因为我认为我原来的问题结合了太多的东西。一旦我找出剩下的这一点,我将尝试更新这个问题与完整的代码,从这个ByteBuffer开始,并读入一个JBLAS矩阵(看起来是正确的数据结构)。



如果有人感兴趣,我可以创建一个函数,返回一个正确有序的bytebuffer如下:

  ;;这工作! 
(defn readf [^ String file]
(.order
(.map
(.getChannel
(java.io.RandomAccessFile。filer))
java.nio.channels.FileChannel $ MapMode / READ_ONLY 0(* 1200 1200))
java.nio.ByteOrder / LITTLE_ENDIAN))

我发现的nio包装看起来很简单/漂亮很多,但似乎我不是正确使用它,或者有错误。使用nio包装器来回顾我的发现:

  ;;这个工作
(def buffer(nio / mmap/ bin / file))
(def buffer(.order buffer java.nio.ByteOrder / LITTLE_ENDIAN))
asFloatBuffer buffer))

;;这会失败
(def buffer
(.asFloatBuffer
(.order
(nio / mmap/ bin / file)
java.nio.ByteOrder / LITTLE_ENDIAN) ))

可悲的是,这是另一天的clojure谜,或者另一个StackOverflow问题。 p>

解决方案

打开 FileChannel(),然后获取内存映射缓冲区。



通过调用 order(endian-ness)将缓冲区的顺序切换到小端序。 (不是 order 的无参版本)。最后,提取浮动的最简单的方法是调用 asFloatBuffer(),并使用结果缓冲区来读取浮点数。



之后,您可以将数据放入您需要的任何结构中。



编辑下面是如何使用API​​的示例。

  ;;首先,我创建了一个96字节的文件,然后我开始复制
;;把一些小endian浮动在文件中并关闭它
user => (def file(java.io.RandomAccessFile。foo.floats,rw))
#'user / file
user => (def channel(.getChannel file))
#'user / channel
user => (def buffer(.map channel java.nio.channels.FileChannel $ MapMode / READ_WRITE 0 96))
#'user / buffer
user => (.order buffer java.nio.ByteOrder / LITTLE_ENDIAN)
#< DirectByteBuffer java.nio.DirectByteBuffer [pos = 0 lim = 96 cap = 96]
user => (def fbuffer(.asFloatBuffer buffer))
#'user / fbuffer
user => (.put fbuffer 0 0.0)
#< DirectFloatBufferU java.nio.DirectFloatBufferU [pos = 0 lim = 24 cap = 24]
user => (.put fbuffer 1 1.0)
#< DirectFloatBufferU java.nio.DirectFloatBufferU [pos = 0 lim = 24 cap = 24]
user => (.put fbuffer 2 2.3)
#< DirectFloatBufferU java.nio.DirectFloatBufferU [pos = 0 lim = 24 cap = 24]
user => (.close channel)
nil

;;内存映射文件,尝试读取浮动w / o更改缓冲区的字节顺序
user => (def file2(java.io.RandomAccessFile。foo.floatsr))
#'user / file2
user => (def channel2(.getChannel file2))
#'user / channel2
user => (def buffer2(.map channel2 java.nio.channels.FileChannel $ MapMode / READ_ONLY 0 96))
#'user / buffer2
user => (def fbuffer2(.asFloatBuffer buffer2))
#'user / fbuffer2
user => (.get fbuffer2 0)
0.0
user => (.get fbuffer2 1)
4.6006E-41
user => (.get fbuffer2 2)
4.1694193E-8

;;更改缓冲区的顺序并读取浮点数
user => (.order buffer2 java.nio.ByteOrder / LITTLE_ENDIAN)
#< DirectByteBufferR java.nio.DirectByteBufferR [pos = 0 lim = 96 cap = 96]
user => (def fbuffer2(.asFloatBuffer buffer2))
#'user / fbuffer2
user => (.get fbuffer2 0)
0.0
user => (.get fbuffer2 1)
1.0
user => (.get fbuffer2 2)
2.3
user => (.close channel2)
nil
user =>


I have a binary file that contains an X by X matrix. The file itself is a sequence of single-precision floats (little-endian). What I would like to do is parse it, and stuff it into some reasonable clojure matrix data type.

Thanks to this question, I see I can parse a binary file with gloss. I now have code that looks like this:

(ns foo.core
  (:require gloss.core)
  (:require gloss.io)
  (:use [clojure.java.io])
  (:use [clojure.math.numeric-tower]))

(gloss.core/defcodec mycodec
  (gloss.core/repeated :float32 :prefix :none))

(def buffer (byte-array (* 1200 1200)))

(.read (input-stream "/path/to/binaryfile") buffer)

(gloss.io/decode mycodec buffer)

This takes a while to run, but eventually dumps out a big list of numbers. Unfortunately, the numbers are all wrong. Upon further investigation, the numbers were read as big-endian.

Assuming there is some way to read these binary files as little-endian, I'd like to stuff the results into a matrix. This question seems to have settled on using Incanter with its Parallel Colt representation, however, that question was from '09, and I'm hoping to stick to clojure 1.4 and lein 2. Somewhere in my frenzy of googling, I saw other recommendations to use jblas or mahout. Is there a "best" matrix library for clojure these days?

EDIT: Reading a binary file is tantalizingly close. Thanks to this handy nio wrapper, I am able to get a memory mapped byte buffer as a short one-liner, and even reorder it:

(ns foo.core
  (:require [clojure.java.io :as io])
  (:require [nio.core :as nio])
  (:import [java.nio ByteOrder]))

(def buffer (nio/mmap "/path/to/binaryfile"))

(class buffer) ;; java.nio.DirectByteBuffer

(.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
;; #<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=5760000 cap=5760000]>

However, reordering without doing the intermediate (def) step, fails:

(.order (nio/mmap f) java.nio.ByteOrder/LITTLE_ENDIAN)

;; clojure.lang.Compiler$CompilerException: java.lang.IllegalArgumentException: Unable to resolve classname: MappedByteBuffer, compiling:(/Users/peter/Developer/foo/src/foo/core.clj:12)
;;  at clojure.lang.Compiler.analyzeSeq (Compiler.java:6462)
;;     clojure.lang.Compiler.analyze (Compiler.java:6262)
;; etc...

I'd like to be able to create the reordered byte buffer this inside a function without defining a global variable, but right now it seems to not like that.

Also, once I've got it reordered, I'm not entirely sure what to do with my DirectByteBuffer, as it doesn't seem to be iterable. Perhaps for the remaining step of reading this buffer object (into a JBLAS matrix), I will create a second question.

EDIT 2: I am marking the answer below as accepted, because I think my original question combined too many things. Once I figure out the remainder of this I will try to update this question with complete code that starts with this ByteBuffer and that reads into a JBLAS matrix (which appears to be the right data structure).

In case anyone was interested, I was able to create a function that returns a properly ordered bytebuffer as follows:

;; This works!
(defn readf [^String file]
  (.order
   (.map
    (.getChannel
     (java.io.RandomAccessFile. file "r"))
    java.nio.channels.FileChannel$MapMode/READ_ONLY 0 (* 1200 1200))
   java.nio.ByteOrder/LITTLE_ENDIAN))

The nio wrapper I found looks to simplify / prettify this quite a lot, but it would appear I'm either not using it correctly, or there is something wrong. To recap my findings with the nio wrapper:

;; this works
(def buffer (nio/mmap "/bin/file"))
(def buffer (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN))
(def buffer (.asFloatBuffer buffer))

;; this fails
(def buffer
  (.asFloatBuffer
   (.order
    (nio/mmap "/bin/file")
    java.nio.ByteOrder/LITTLE_ENDIAN)))

Sadly, this is a clojure mystery for another day, or perhaps another StackOverflow question.

解决方案

Open a FileChannel(), then get a memory mapped buffer. There are lots of tutorials on the web for this step.

Switch the order of the buffer to little endian by calling order(endian-ness) (not the no-arg version of order). Finally, the easiest way to extract floats would be to call asFloatBuffer() on it and use the resulting buffer to read the floats.

After that you can put the data into whatever structure you need.

edit Here's an example of how to use the API.

;; first, I created a 96 byte file, then I started the repl
;; put some little endian floats in the file and close it
user=> (def file (java.io.RandomAccessFile. "foo.floats", "rw"))
#'user/file
user=> (def channel (.getChannel file))
#'user/channel
user=> (def buffer (.map channel java.nio.channels.FileChannel$MapMode/READ_WRITE 0 96))
#'user/buffer
user=> (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
#<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=96 cap=96]>
user=> (def fbuffer (.asFloatBuffer buffer))
#'user/fbuffer
user=> (.put fbuffer 0 0.0)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.put fbuffer 1 1.0)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.put fbuffer 2 2.3)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.close channel)
nil

;; memory map the file, try reading the floats w/o changing the endianness of the buffer
user=> (def file2 (java.io.RandomAccessFile. "foo.floats" "r"))
#'user/file2
user=> (def channel2 (.getChannel file2))                                                
#'user/channel2
user=> (def buffer2 (.map channel2 java.nio.channels.FileChannel$MapMode/READ_ONLY 0 96))
#'user/buffer2
user=> (def fbuffer2 (.asFloatBuffer buffer2))
#'user/fbuffer2
user=> (.get fbuffer2 0)
0.0
user=> (.get fbuffer2 1)
4.6006E-41
user=> (.get fbuffer2 2)
4.1694193E-8

;; change the order of the buffer and read the floats    
user=> (.order buffer2 java.nio.ByteOrder/LITTLE_ENDIAN)                                 
#<DirectByteBufferR java.nio.DirectByteBufferR[pos=0 lim=96 cap=96]>
user=> (def fbuffer2 (.asFloatBuffer buffer2))
#'user/fbuffer2
user=> (.get fbuffer2 0)
0.0
user=> (.get fbuffer2 1)
1.0
user=> (.get fbuffer2 2)
2.3
user=> (.close channel2)
nil
user=> 

这篇关于解析一个小端的二进制文件,填充到矩阵中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆