同时在不同的目录(os.chdir)工作(并行线程) [英] working in different directories (os.chdir) in the same time (parallel threading)

查看:283
本文介绍了同时在不同的目录(os.chdir)工作(并行线程)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想并行同步我所有的vcs目录。我要去目录并运行特殊的命令行脚本来同步git或者mercurial存储库。这是很慢的过程,所以我想尝试使它平行。



但是,我的并行线程争取当前目录有麻烦,所以我需要一些技巧在不同的工作目录在同一时间。



当前解决方案:

  def syncrepos (repos):repos.split(\\\
)中的

如果r:
print(------存储库:,r)
thrd = ThreadingSync(r)
thrd.setDaemon(True)
thrd.start()

其中ThreadingSync是

 类ThreadingSync(threading.Thread):
def __init __(self, repo):
threading.Thread .__ init __(self)
self.repo = repo
def run(self):
r = self.repo.split( - t)
path =(r [0])。strip()
如果len(r) 2:
vcs = VCS.git
else:
vcs = {
'git':VCS.git,
'git git':VCS.git_git,
'git hg':VCS.git_mercurial,
'git svn':VCS.git_subversion,
'git vv':VCS.git_veracity,
'hg hg':VCS.hg_hg } {(r [1])。strip()]
os.chdir(路径)
如果vcs == VCS.git:
checkGitModifications()
gitSync()
... etc

gitSync

$ b

  def gitSync():
pretty(cmd(git pull origin master))
(git fetch upstream master)
pretty(cmd(git pull --rebase upstream master))
pretty(cmd(git push -f origin master))

确实这不完美,但它做我的工作,我想加快它。



如何为每个存储库/目录生成一个子进程(Thrad安全实现os.chdir)?

解决方案

创建一个工作池来运行子程序:



http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-工作人员



在您的情况下,可能是这样:

 从多进程导入池
import os

def gitSync(repo):
打印我是,repo,我的cwd是:,os.getcwd()
os.chdir(repo)
打印我是,repo,我的cwd是:,os.getcwd()

如果__name__ =='__main__' :
dir = os.getcwd()
repos = [os.listdir(dir)中的项目,如果os.path.isdir(os.path.join(dir,item))]
print repos
pool = Pool(maxtasksperchild = 1)
pool.map(gitSync,repos)
pool.cl ose()
pool.join()

请注意,池可以调试一下父母通常不会透露我的一些孩子死亡的困难,所以先让单线程工作。



编辑:
好有趣的是,请注意池的新参数 maxtasksperchild = 1 。在调用之间进程不是重新启动,所以当您在一次调用中更改目录时,当进程重新使用时,您仍然在该目录中。在这里我简单地通过告诉池在每次调用后杀死进程。

  john:captcrunch john $ python foo .py 
['.git','.idea','fixtures','lib','obj','raw','tests']
我是.git,我的cwd是: / users / john / code / linz / src / captcrunch
我是.git,我的cwd是:/Users/john/code/linz/src/captcrunch/.git
我是.idea和我的cwd是:/ Users / john / code / linz / src / captcrunch
我是.idea和我的cwd是:/Users/john/code/linz/src/captcrunch/.idea
我是灯具我的cwd是:/ Users / john / code / linz / src / captcrunch
我是灯具,我的cwd是:/ Users / john / code / linz / src / captcrunch / fixtures
我是lib我的cwd是:/ Users / john / code / linz / src / captcrunch
我是lib和我的cwd是:/ Users / john / code / linz / src / captcrunch / lib
我是obj我的cwd是:/ Users / john / code / linz / src / captcrunch
我是obj,我的cwd是:/ Users / john / code / linz / src / captcrunch / obj
我是原始的我的cwd是:/ Users / john /代码/ linz / src / captcrunch
我是原始的,我的cwd是:/ Users / john / code / linz / src / captcrunch / raw
我是测试,我的cwd是:/ Users / john /代码/ linz / src / captcrunch
我是测试,我的cwd是:/ Users / john / code / linz / src / captcrunch / tests


I want to sync all of my vcs directories in parallel. I'm going to directory and run special command line scripts to sync git or mercurial repositories. It's slow process so I want to try to make it parallel.

But there is trouble my parallel threads fight for "current directory" so I need some trick to work in different directories in the same time.

Current solution:

def syncrepos(repos):
  for r in repos.split("\n"):
    if r:
      print("------ repository: ", r)
      thrd = ThreadingSync(r)
      thrd.setDaemon(True)
      thrd.start()

where ThreadingSync is

class ThreadingSync(threading.Thread):
  def __init__(self, repo):
    threading.Thread.__init__(self)
    self.repo = repo
  def run(self):
    r = self.repo.split("-t")
    path = (r[0]).strip()
    if len(r) < 2:
      vcs = VCS.git
    else:
      vcs = {
    'git'       : VCS.git,
    'git git'   : VCS.git_git,
    'git hg'    : VCS.git_mercurial,
    'git svn'   : VCS.git_subversion,
    'git vv'    : VCS.git_veracity,
    'hg hg'     : VCS.hg_hg}[(r[1]).strip()]
    os.chdir(path)
    if vcs == VCS.git:
      checkGitModifications()
      gitSync()
    ... etc

and gitSync is

def gitSync(): 
  pretty(cmd("git pull origin master"))
  pretty(cmd("git fetch upstream master"))
  pretty(cmd("git pull --rebase upstream master"))
  pretty(cmd("git push -f origin master"))

Sure this is not perfect but it does my work and I want to speed up it.

How to spawn one subprocess for each repository/directory (Thrad safe implementation of os.chdir) ?

解决方案

Create a pool of workers to run your subroutine:

http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

In your case perhaps something like:

from multiprocessing import Pool
import os

def gitSync(repo):
    print "I am", repo, "and my cwd is:", os.getcwd()
    os.chdir(repo)
    print "I am", repo, "and my cwd is:", os.getcwd()

if __name__ == '__main__':
    dir = os.getcwd()
    repos = [item for item in os.listdir(dir) if os.path.isdir(os.path.join(dir, item))]
    print repos
    pool = Pool(maxtasksperchild=1)
    pool.map(gitSync, repos)
    pool.close()
    pool.join()

Note that the pool can make debugging a bit difficult as the parent usually doesn't reveal much more than -one of my children died-, so get it working single threaded first.

Edit: Well that was interesting to appreciate - note the new argument to the Pool maxtasksperchild=1. The process is not rebooted between invocations so when you change the directory in one invocation, you're still in that directory when the process gets reused. Here I've solved it simply by telling the pool to kill processes after every single invocation.

john:captcrunch john$ python foo.py 
['.git', '.idea', 'fixtures', 'lib', 'obj', 'raw', 'tests']
I am .git and my cwd is: /Users/john/code/linz/src/captcrunch
I am .git and my cwd is: /Users/john/code/linz/src/captcrunch/.git
I am .idea and my cwd is: /Users/john/code/linz/src/captcrunch
I am .idea and my cwd is: /Users/john/code/linz/src/captcrunch/.idea
I am fixtures and my cwd is: /Users/john/code/linz/src/captcrunch
I am fixtures and my cwd is: /Users/john/code/linz/src/captcrunch/fixtures
I am lib and my cwd is: /Users/john/code/linz/src/captcrunch
I am lib and my cwd is: /Users/john/code/linz/src/captcrunch/lib
I am obj and my cwd is: /Users/john/code/linz/src/captcrunch
I am obj and my cwd is: /Users/john/code/linz/src/captcrunch/obj
I am raw and my cwd is: /Users/john/code/linz/src/captcrunch
I am raw and my cwd is: /Users/john/code/linz/src/captcrunch/raw
I am tests and my cwd is: /Users/john/code/linz/src/captcrunch
I am tests and my cwd is: /Users/john/code/linz/src/captcrunch/tests

这篇关于同时在不同的目录(os.chdir)工作(并行线程)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆