Python:使用多处理模块作为可能的解决方案来提高我的函数的速度 [英] Python: Using multiprocessing module as possible solution to increase the speed of my function

查看:95
本文介绍了Python:使用多处理模块作为可能的解决方案来提高我的函数的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用Python 2.7(在Windows OS 64位上)编写了一个函数,以便根据ESRI中的参考多边形(Ref)和一个或多个分段(Seg)多边形计算相交区域的平均值<一种href ="http://en.wikipedia.org/wiki/Shapefile" rel ="nofollow"> shapefile格式.代码很慢,因为我有2000多个参考多边形,并且对于每个Ref_polygon,该函数每次都会对所有Seg多边形(超过7000个)运行.抱歉,该函数是一个原型.

I wrote a function in Python 2.7 (on Window OS 64bit) in order to calculate the mean value of of the intersection area from a reference polygon (Ref) and one or more segmented (Seg) polygon(s) in ESRI shapefile format. The code is quite slow because i have more that 2000 reference polygon (s) and for each Ref_polygon the function run for every time for all Seg polygons(s) (more than 7000). I am sorry but the function is a prototype.

我想知道多重处理是否可以帮助我提高处理速度我的循环还是有更多的性能解决方案.如果多处理可能是一种可行的解决方案,我希望了解优化我的后续功能的最佳方法

I wish to know if multiprocessing can help me to increase the speed of my loop or there are more performance solutions. if multiprocessing can be a possible solution i wish to know the best way to optimize my following function

import numpy as np
import ogr
import osr,gdal
from shapely.geometry import Polygon
from shapely.geometry import Point
import osgeo.gdal
import osgeo.gdal as gdal

def AreaInter(reference,segmented,outFile):
     # open shapefile
     ref = osgeo.ogr.Open(reference)
     if ref is None:
          raise SystemExit('Unable to open %s' % reference)
     seg = osgeo.ogr.Open(segmented)
     if seg is None:
          raise SystemExit('Unable to open %s' % segmented)
     ref_layer = ref.GetLayer()
     seg_layer = seg.GetLayer()
     # create outfile
     if not os.path.split(outFile)[0]:
          file_path, file_name_ext = os.path.split(os.path.abspath(reference))
          outFile_filename = os.path.splitext(os.path.basename(outFile))[0]
          file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w")
     else:
          file_path_name, file_ext = os.path.splitext(outFile)
          file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w")
     # For each reference objects-i
     for index in xrange(ref_layer.GetFeatureCount()):
          ref_feature = ref_layer.GetFeature(index)
          # get FID (=Feature ID)
          FID = str(ref_feature.GetFID())
          ref_geometry = ref_feature.GetGeometryRef()
          pts = ref_geometry.GetGeometryRef(0)
          points = []
          for p in xrange(pts.GetPointCount()):
               points.append((pts.GetX(p), pts.GetY(p)))
          # convert in a shapely polygon
          ref_polygon = Polygon(points)
          # get the area
          ref_Area = ref_polygon.area
          # create an empty list               
          Area_seg, Area_intersect = ([] for _ in range(2))
          # For each segmented objects-j
          for segment in xrange(seg_layer.GetFeatureCount()):
               seg_feature = seg_layer.GetFeature(segment)
               seg_geometry = seg_feature.GetGeometryRef()
               pts = seg_geometry.GetGeometryRef(0)
               points = []
               for p in xrange(pts.GetPointCount()):
                    points.append((pts.GetX(p), pts.GetY(p)))
               seg_polygon = Polygon(points)
               seg_Area.append = seg_polygon.area
               # intersection (overlap) of reference object with the segmented object
               intersect_polygon = ref_polygon.intersection(seg_polygon)
               # area of intersection (= 0, No intersection)
               intersect_Area.append = intersect_polygon.area
          # Avarage for all segmented objects (because 1 or more segmented polygons can  intersect with reference polygon)
          seg_Area_average = numpy.average(seg_Area)
          intersect_Area_average = numpy.average(intersect_Area)
          file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
     file_out.close()

推荐答案

您可以使用 multiprocessing 包,尤其是Pool类.首先创建一个函数,该函数执行您要在for循环中完成的所有工作,并且仅将索引作为参数:

You can use the multiprocessing package, and especially the Pool class. First create a function that does all the stuff you want to do within the for loop, and that takes as an argument only the index:

def process_reference_object(index):
      ref_feature = ref_layer.GetFeature(index)
      # all your code goes here
      return (" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")

注意,这不会写入文件本身,这很麻烦,因为您将有多个进程同时写入同一文件.而是返回需要写入的字符串.还要注意,此函数中有一些对象,例如ref_layerref_geometry,将需要以某种方式到达它-这取决于您如何执行(您可以将process_reference_object作为方法初始化到使用它们的类中,或者可能只是在全局范围内定义它们一样丑陋.

Note that this doesn't write to a file itself- that would be messy because you'd have multiple processes writing to the same file at the same time. Instead, it returns the string that needs to be written. Also note that there are objects in this function like ref_layer or ref_geometry that will need to reach it somehow- that's up to you how to do it (you could put process_reference_object as the method in a class initialized with them, or it could be as ugly as just defining them globally).

然后,创建一个进程资源池,并使用Pool.imap_unordered(将其自身根据需要将每个索引分配给不同的进程)运行所有索引:

Then, you create a pool of process resources, and run all of your indices using Pool.imap_unordered (which will itself allocate each index to a different process as necessary):

from multiprocessing import Pool
p = Pool()  # run multiple processes
for l in p.imap_unordered(process_reference_object, range(ref_layer.GetFeatureCount())):
    file_out.write(l)

这将并行化多个过程中对参考对象的独立处理,并将它们写入文件(按任意顺序,请注意).

This will parallelize the independent processing of your reference objects across multiple processes, and write them to the file (in an arbitrary order, note).

这篇关于Python:使用多处理模块作为可能的解决方案来提高我的函数的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆