高效压缩numpy数组 [英] Compress numpy arrays efficiently

查看:468
本文介绍了高效压缩numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将某些numpy arrays保存到磁盘时,我尝试了各种方法来进行数据压缩.

I tried various methods to do data compression when saving to disk some numpy arrays.

这些1D数组包含以一定采样率采样的数据(可以用麦克风录制的声音,或用任何传感器进行的任何其他测量):数据本质上是连续的(从数学意义上讲;当然,在采样之后,它现在是离散数据.

These 1D arrays contain sampled data at a certain sampling rate (can be sound recorded with a microphone, or any other measurment with any sensor) : the data is essentially continuous (in a mathematical sense ; of course after sampling it is now discrete data).

我尝试使用HDF5(h5py):

I tried with HDF5 (h5py) :

f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9)

但这非常慢,而且压缩率并不是我们可以预期的最佳压缩比.

but this is quite slow, and the compression ratio is not the best we can expect.

我也尝试过

numpy.savez_compressed()

但是,对于上述数据,它可能并不是最佳的压缩算法(如前所述).

but once again it may not be the best compression algorithm for such data (described before).

对于具有此类数据的numpy array压缩率更好的您会如何选择?

What would you choose for better compression ratio on a numpy array, with such data ?

(我曾考虑过无损FLAC(最初是为音频而设计的,但是有一种简单的方法可以将这种算法应用于numpy数据吗?)

(I thought about things like lossless FLAC (initially designed for audio), but is there an easy way to apply such an algorithm on numpy data ?)

推荐答案

我现在要做什么:

import gzip
import numpy

f = gzip.GzipFile("my_array.npy.gz", "w")
numpy.save(file=f, arr=my_array)
f.close()

这篇关于高效压缩numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆