Python - 使用线程或队列遍历调用函数的for循环 [英] Python - Using threads or a queue to iterate over a for loop that calls a function

查看:658
本文介绍了Python - 使用线程或队列遍历调用函数的for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Python非常陌生,正在制作一个脚本,允许将来自其他程序的点云数据导入Autodesk Maya。我的脚本运行正常,但是我想要做的是让它更快。我有一个循环遍历编号文件的列表。即datafile001.txt,datafile002.txt等等。我想知道的是,如果有一种方法可以使它多次执行一次,可能使用线程或队列?下面我有我一直在努力的代码:

$ p $ def $ threadedFuntion(args):
如果len(sourceFiles) > 3:
为count,枚举(sourceFiles)中的项:
t1 =线程(target = convertPcToPdc,args =(sourceFiles [filenumber1],particlesName,startframe,endframe,pdcIncrements,outputDirectory,variableFolder,acceptableArrayforms, dataType))
t1.start()
t2 =线程(target = convertPcToPdc,args =(sourceFiles [filenumber2],particlesName,startframe,endframe,pdcIncrements,outputDirectory,variableFolder,acceptableArrayforms,dataType))
t2.start()
t3 =线程(target = convertPcToPdc,args =(sourceFiles [filenumber3],particlesName,startframe,endframe,pdcIncrements,outputDirectory,variableFolder,acceptableArrayforms,dataType))
t3 .start()
t4 =线程(target = convertPcToPdc,args =(sourceFiles [filenumber4],particlesName,startframe,endframe,pdcIncrements,outputDirectory,variableFolder,acceptableArrayforms,data类型))
t4.start()

这显然不适用于数字的原因,首先它只会创建4个线程,我希望能够或多或少地给出一个选项。其次它错误,因为它试图重用一个线程?就像我刚才说的,我对python很陌生,而且头脑微微一点,我已经在这里阅读了几篇文章,但是没能找到一个很好的工作。我认为一个队列可能是我需要的东西,但不能完全弄清楚,我尝试了条件语句和加入语句,但是再次得不到我想要的结果。

我想更具体的是我想实现的是,函数是通过读取文本文件,检索coords,然后将其导出为二进制文件供maya读取。这些文本文件中的一个具有5-10百万个x,y,z坐标是常见的,这需要相当长的一段时间。大约需要30分钟到1小时的时间在一台漂亮的计算机上做一个文件,任务管理器说python只使用了12%的处理器和大约1%的ram,所以如果我可以同时做多个这样的文件,更多的文件会更快。我不认为这将是难以多线程/排队一个for循环,但我已经失去了,并尝试失败的解决方案约一个星期。



谢谢你们所有的帮助,我真的很感激,并认为这个网站是惊人的。这是我的第一篇文章,但我觉得我已经完全学会了Python只是从这里阅读。

解决方案

Subclass threading.Thread并将你的工作函数作为run()的一部分加入到这个类中。

$ $ $ $ code $ import
import time
import b
$ b class Worker(threading.Thread):
def __init __(self,srcfile,printlock,** kwargs):
super(Worker,self).__ init __( ** kwargs)
self.srcfile = srcfile
self.lock = printlock#因此线程不会踩在彼此的打印上

def run(self):
和self.lock:
print(%s上%s起始%s%(self.ident,self.srcfile))
#做任何你需要的,例如,睡眠时间间隔可达10秒
time.sleep(random.random()* 10)
with self.lock:
print(%s done%self。身份证)


def threadme(src文件):
printlock = threading.Lock()
threadpool = []
用于srcfiles中的文件:
threadpool.append(Worker(file,printlock))

在线程池中:
thr.start()

这个循环会阻塞,直到所有线程完成
#(但是它不一定会首先连接那些首先完成的)
在线程池中:
thr.join()

print(所有线程完成)

if __name__ ==__main__:
threadme([abc,def,ghi])

按要求,限制线程数量,使用下面的代码:

  def threadme(infiles,threadlimit = None,timeout = 0.01):
assert threadlimit是None或threadlimit> 0,\
至少需要一个线程;
printlock = threading.Lock()
srcfiles = list(infiles)
threadpool = []

#继续工作或做完
而srcfiles或线程池:

#有空间时,删除源文件
#并添加到池
,而srcfiles和\ $​​ b $ b(threadlimit是None \\ \\
或len(threadpool)< threadlimit):
file = srcfiles.pop()
wrkr =工人(文件,打印锁)
wrkr.start()
threadpool.append(wrkr)

#从池中删除完成的线程
在线程池中:
thr.join(timeout = timeout)
如果不是thr .is_alive():
threadpool.remove(thr)

print(所有线程完成)

if __name__ ==__main__:$ b (1,2,3,4):
print(---以线程限制运行%i ---%lim)
threadme((abc,def,ghi),threadlimit = lim)

这将实际上处理源反向(由于名单pop())。如果您需要按顺序完成这些操作,请在某处反转列表,或使用deque和popleft()。


I'm fairly new to python and am making a script that allows one to bring point cloud data from other programs into Autodesk Maya. I have my script functioning fine but what i'm trying to do is make it faster. I have a for loop that iterates through a list of numbered files. I.e. datafile001.txt, datafile002.txt and so on. Is what i'm wondering is if there is a way to have it to do more then one at a time, possibly using threads or a queue? Below I have the code i have been working on:

     def threadedFuntion(args):
         if len(sourceFiles) > 3:
             for count, item in enumerate(sourceFiles):
                     t1=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber1], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType))
                     t1.start()
                     t2=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber2], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType))
                     t2.start()
                     t3=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber3], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType))
                     t3.start()
                     t4=Thread(target=convertPcToPdc,args=(sourceFiles[filenumber4], particlesName, startframe, endframe, pdcIncrements, outputDirectory, variableFolder, acceptableArrayforms, dataType))
                     t4.start()

This obviously doesn't work for a number of reasons, first it only will create 4 threads, I would like to be able to give an option for more or less. Second it errors because it's trying to reuse a thread? Like I said i'm quite new to python and am a little over my head, I've been reading several posts on here but can't get one to work quite right. I think a queue might be something I need but couldn't quite figure it out, I experimented with the condition statement and with the join statement, but once again couldn't get what I want.

I guess to be more specific what I want to achieve is that the function is reading through a text file, retrieving coords and then exporting them as a binary file for maya to read. It's common for one of these text files to have 5-10 million x,y,z coords which takes quite some time. It takes around 30mins-1hour to do 1 file on a pretty beastly computer, task manager says python is only using 12% processor and around 1% ram, so if I could do multiple of these at once, it would make doing those 100 or more files go by a lot faster. I wouldn't think it would be to hard to multithread/queue up a for loop, but I've been lost and trying failing solutions for about a week.

Thank you all for any help, I really appreciate it and think this website is amazing. This is my first post, but I feel like I have completely learned python just from reading on here.

解决方案

Subclass threading.Thread and put your work function in that class as part of run().

import threading
import time
import random

class Worker(threading.Thread):
    def __init__(self, srcfile, printlock,**kwargs):
        super(Worker,self).__init__(**kwargs)
        self.srcfile = srcfile
        self.lock = printlock # so threads don't step on each other's prints

    def run(self):
        with self.lock:
            print("starting %s on %s" % (self.ident,self.srcfile))
        # do whatever you need to, return when done
        # example, sleep for a random interval up to 10 seconds
        time.sleep(random.random()*10)
        with self.lock:
            print("%s done" % self.ident)


def threadme(srcfiles):
    printlock = threading.Lock()
    threadpool = []
    for file in srcfiles:
        threadpool.append(Worker(file,printlock))

    for thr in threadpool:
        thr.start()

    # this loop will block until all threads are done
    # (however it won't necessarily first join those that are done first)
    for thr in threadpool:
        thr.join()

    print("all threads are done")

if __name__ == "__main__":
    threadme(["abc","def","ghi"])

As requested, to limit the number of threads, use the following:

def threadme(infiles,threadlimit=None,timeout=0.01):
    assert threadlimit is None or threadlimit > 0, \
           "need at least one thread";
    printlock = threading.Lock()
    srcfiles = list(infiles)
    threadpool = []

    # keep going while work to do or being done
    while srcfiles or threadpool:

        # while there's room, remove source files
        # and add to the pool
        while srcfiles and \
           (threadlimit is None \
            or len(threadpool) < threadlimit):
            file = srcfiles.pop()
            wrkr = Worker(file,printlock)
            wrkr.start()
            threadpool.append(wrkr)

        # remove completed threads from the pool
        for thr in threadpool:
            thr.join(timeout=timeout)
            if not thr.is_alive():
                threadpool.remove(thr)

    print("all threads are done")

if __name__ == "__main__":
    for lim in (1,2,3,4):
        print("--- Running with thread limit %i ---" % lim)
        threadme(("abc","def","ghi"),threadlimit=lim)

Note that this will actually process the sources in reverse (due to the list pop()). If you require them to be done in order, reverse the list somewhere, or use a deque and popleft().

这篇关于Python - 使用线程或队列遍历调用函数的for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆