高效压缩numpy数组 [英] Compress numpy arrays efficiently
问题描述
在将某些numpy arrays
保存到磁盘时,我尝试了各种方法来进行数据压缩.
I tried various methods to do data compression when saving to disk some numpy arrays
.
这些1D数组包含以一定采样率采样的数据(可以用麦克风录制的声音,或用任何传感器进行的任何其他测量):数据本质上是连续的(从数学意义上讲;当然,在采样之后,它现在是离散数据.
These 1D arrays contain sampled data at a certain sampling rate (can be sound recorded with a microphone, or any other measurment with any sensor) : the data is essentially continuous (in a mathematical sense ; of course after sampling it is now discrete data).
我尝试使用HDF5
(h5py):
I tried with HDF5
(h5py) :
f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9)
但这非常慢,而且压缩率并不是我们可以预期的最佳压缩比.
but this is quite slow, and the compression ratio is not the best we can expect.
我也尝试过
numpy.savez_compressed()
但是,对于上述数据,它可能并不是最佳的压缩算法(如前所述).
but once again it may not be the best compression algorithm for such data (described before).
对于具有此类数据的numpy array
压缩率更好的您会如何选择?
What would you choose for better compression ratio on a numpy array
, with such data ?
(我曾考虑过无损FLAC(最初是为音频而设计的,但是有一种简单的方法可以将这种算法应用于numpy数据吗?)
(I thought about things like lossless FLAC (initially designed for audio), but is there an easy way to apply such an algorithm on numpy data ?)
推荐答案
我现在要做什么:
import gzip
import numpy
f = gzip.GzipFile("my_array.npy.gz", "w")
numpy.save(file=f, arr=my_array)
f.close()
这篇关于高效压缩numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!