使用Python从保存在H5文件中的超大数据集生成pcolormesh图像 [英] Generating pcolormesh images from very large data sets saved in H5 files with Python

查看:451
本文介绍了使用Python从保存在H5文件中的超大数据集生成pcolormesh图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在收集大量数据,这些数据将使用h5py保存到单个H5文件中.我想将这些图像修补到一个pcolormesh图中,以保存为单个图像.

I am collecting a large amount of data that will be saved into individual H5 files using h5py. I would like to patch these images together into one pcolormesh plot to be saved as a single image.

我正在研究的一个简单示例生成2000x2000随机数据点的数组,并使用h5py将它们保存在H5文件中.然后,我尝试将数据导入这些文件中,并尝试将其作为pcolormesh绘制在matplotlib中,但是我总是遇到memoryError(这是预期的).

A quick example I have been working on generates arrays of 2000x2000 random data points and saves them in H5 files using h5py. Then I try to import the data in these files and try to plot it in matplotlib as a pcolormesh, but I always run into a memoryError (which is expected).

import numpy
import h5py
arr = numpy.random.random((2000,2000))

with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_0.h5", "w") as f:
    dset = f.create_dataset("Plot_0", data = arr)

for i in range(1,100):
    arr = numpy.random.random((2000,2000))
    with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_" + str(i) + ".h5", "w") as f:
        dset = f.create_dataset("Plot_" + str(i), data = arr)

此脚本生成我的文件.我选择100作为任意数字,只是为了要提取足够多的文件.

This script generates my files. I picked 100 as an arbitrary number just to have a large enough set of files to pull from.

然后我使用以下脚本导入它们:

Then I import them using the following script:

y = numpy.arange(0, 2000, 1)

for display_plot_num in range(0, 5):
    print display_plot_num
    x = numpy.arange(display_plot_num*2000, display_plot_num*2000 + 2000, 1)

    with h5py.File("TEST_HDF5_SAVE_FILES\\Plot_" + str(display_plot_num) + ".h5", "r+") as f:
        data = f["Plot_" + str(display_plot_num)]
        plt.pcolormesh(x, y, data)
plt.show()

for循环中的范围值最多可以更改为100,但是我可以选择的最大值是5(即可以在matplotlib中的pcolormesh图上修补5个图),而不会出现内存错误,并且它非常笨重,慢.我需要能够将许多图像拼凑在一起.

The range value in the for loop can be altered up until 100, but the maximum value I can choose without a memory error is 5 (i.e. 5 plots can be patched on a pcolormesh plot in matplotlib) and it is extremely clunky and slow. I need to be able to patch together many images.

我还应该使用其他任何技术来绘制此数据吗?或者,如果我不经过matplotlib或类似程序(例如scipy)就可以将多个H5文件中的数据转换为图像,那就很好了.

Is there any other technique I should use to plot this data? Or it would be nice if I could just convert the data from multiple H5 files into an image without going through matplotlib or a similar program (like scipy).

总而言之,我的问题是这样

In summary, my problem is this:

  • 我有大量带有图像数据的HDF5文件(2000x2000)
  • 我需要将这些文件修补到一个图像中并保存

感谢您的帮助.另外,我很乐意回答有关我的问题的任何其他问题.

Any help is appreciated. Also, I would be glad to answer any further questions about my problem.

编辑(5.6.2013):

Edit (5.6.2013):

我觉得类似的问题是如何在Python中处理(导入,操纵,编辑等)非常高分辨率的图像.从本质上讲,这就是我想要做的.从较小的图像集合中生成高分辨率的图像.

I feel a similar question would be how to deal (import, manipulate, edit, etc.) with very high resolution images in Python. This is essentially what I am trying to do; generate a very high resolution image from a collection of smaller images.

推荐答案

减少matplotlib中图像膨胀的一种方法(尤其是保存到SVG时)是使用rasterized=True kwarg.这实际上将压平"您的pcolormesh,从而使保存速度更快,使用的资源更少,等等.

One way to reduce the bloat of images in matplotlib (especially when saving to SVG) is to use the rasterized=True kwarg. This will essentially "flatten" your pcolormesh, which makes it much faster to save, uses less resources, etc.

这篇关于使用Python从保存在H5文件中的超大数据集生成pcolormesh图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆