异步读取和处理在Python的图像 [英] Asynchronously read and process an image in python

查看:254
本文介绍了异步读取和处理在Python的图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上下文

我经常发现自己在以下情况:


  • 我有图像文件名的列表,我需要处理

  • 我使用例如读取每个图像顺序scipy.misc.imread

  • 然后我做的每一个图像上某种处理并返回结果

  • 我一起保存图像文件名,结果到现有的

的问题是,根本读取图像需要的时间不可忽略的量,有时相当或甚至长于所述图像处理

所以我在想,最好我能读懂图像n + 1在处理图像n。甚至更好的处理,并以自动地确定最佳的方式读取一次​​多张图片?

我看过多,线,双绞线,GEVENT之类的,但我不能找出哪一个使用,如何实现这个想法。有没有人有解决这种问题?

小例子

 #生成图像列表
scipy.misc.imsave(lena.png,scipy.misc.lena())
文件= ['lena.png'] * 100#一个简单的图像处理任务
高清process_image(IM,阈值= 128):
    标签中,n = scipy.ndimage.label(IM>阈值)
    返回否#我现在的主循环
在文件F:
    IM = scipy.misc.imread(F)
    打印process_image(IM)


解决方案

菲利普的回答是好的,但只会造成一对夫妇的过程(一个读数,一台电脑),这将很难最大程度的发挥现代> 2的核心系统。下面是使用替代<一个href=\"http://docs.python.org/library/multiprocessing.html#module-multiprocessing.pool\"><$c$c>multiprocessing.Pool (具体而言,它的地图法)创造这两者都做的阅读和计算方面的进程,但它应该更好地利用所有内核的您有可用(假设有更多的文件超过核心)。

 #!的/ usr /斌/包膜蟒蛇进口多
进口SciPy的
进口scipy.misc
进口scipy.ndimage类处理器:
    高清__init __(自我,阈值):
        self._threshold =门槛    高清__call __(自我,文件名):
        IM = scipy.misc.imread(文件名)
        标签中,n = scipy.ndimage.label(IM&GT; self._threshold)
        返回否高清的main():
    scipy.misc.imsave(lena.png,scipy.misc.lena())
    文件= ['lena.png'] * 100    PROC =处理器(128)
    池= multiprocessing.Pool()
    结果= pool.map(PROC,文件)    打印结果如果__name__ ==__main__:
    主要()

如果我增加图像的数量为500,并使用程序= N 参数泳池,然后我得到

  Processes运行
   1 6.2s
   2 3.2S
   4 1.8秒
   8 1.5秒

在我的四核酷睿i7超线程

如果你陷入更现实的用例(即实际不同的图像),您的过程可能会花费更多的时间在等待上的图像数据从存储负载(在我的测试,它们加载几乎立即从缓存盘),然后比内核获得计算和负载的一些重叠它可能是值得明确创建多个进程。仅在实际负载和HW自己的可扩展性测试可以告诉你最好的其实不过什么是适合你。

Context

I often found myself in the following situation:

  • I have a list of image filenames I need to process
  • I read each image sequentially using for instance scipy.misc.imread
  • Then I do some kind of processing on each image and return a result
  • I save the result along the image filename into a Shelf

The problem is that simply reading the image takes a non negligible amount of time, sometime comparable or even longer than the image processing.

Question

So I was thinking that ideally I could read image n + 1 while processing image n. Or even better processing and reading multiple images at once in an automagically determined optimal way ?

I have read about multiprocessing, threads, twisted, gevent and the like but I can't figure out which one to use and how to implement this idea. Does anyone have a solution to this kind of issue ?

Minimal example

# generate a list of images
scipy.misc.imsave("lena.png", scipy.misc.lena())
files = ['lena.png'] * 100

# a simple image processing task
def process_image(im, threshold=128):
    label, n = scipy.ndimage.label(im > threshold)
    return n

# my current main loop
for f in files:
    im = scipy.misc.imread(f)
    print process_image(im)

解决方案

Philip's answer is good, but will only create a couple of processes (one reading, one computing) which will hardly max out a modern >2 core system. Here's an alternative using multiprocessing.Pool (specifically, its map method) which creates processes which do both the reading and compute aspects, but which should make better use of all the cores you have available (assuming there are more files than cores).

#!/usr/bin/env python

import multiprocessing
import scipy
import scipy.misc
import scipy.ndimage

class Processor:
    def __init__(self,threshold):
        self._threshold=threshold

    def __call__(self,filename):
        im = scipy.misc.imread(filename)
        label,n = scipy.ndimage.label(im > self._threshold)
        return n

def main():
    scipy.misc.imsave("lena.png", scipy.misc.lena())
    files = ['lena.png'] * 100

    proc=Processor(128)
    pool=multiprocessing.Pool()
    results=pool.map(proc,files)

    print results

if __name__ == "__main__":
    main()

If I increase the number of images to 500, and use the processes=N argument to Pool, then I get

Processes   Runtime
   1         6.2s
   2         3.2s
   4         1.8s
   8         1.5s

on my quad-core hyperthreaded i7.

If you got into more realistic use-cases (ie actual different images), your processes might be spending more time waiting on the image data to load from storage (in my testing, they load virtually instantaneously from cached disk) and then it might be worth explicitly creating more processes than cores to get some more overlap of compute and load. Only your own scalability testing on a realistic load and HW can tell you what's actually best for you though.

这篇关于异步读取和处理在Python的图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆