在wsgi中如何启动pdftk子进程? [英] How to launch a pdftk subprocess while in wsgi?

查看:130
本文介绍了在wsgi中如何启动pdftk子进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Django中提供Web请求时,我需要启动一个pdftk进程,并等待完成。我当前的pdftk代码如下所示:

  proc = subprocess.Popen([/ usr / bin / pdftk,
/tmp/infile1.pdf,
/tmp/infile2.pdf,
cat,output,/tmp/outfile.pdf])
proc.communicate()

只要我在开发者服务器以用户身份运行 www-data )。但是,一旦我切换到mod_wsgi,不要改变任何东西,代码将挂在 proc.communicate(),而outfile.pdf作为一个打开的文件句柄零长度。



我已经尝试了子流程调用的几个变体(以及普通旧的os.system) - 将stdin / stdout / stderr设置为PIPE或到各种文件句柄没有任何改变。使用shell = True阻止从挂起的 proc.communicate(),但是pdftk无法在devserver或mod_wsgi下创建输出文件。 这个讨论似乎表明可能会有一些更深的巫术继续操作系统信号和pdftk我不明白。



有没有任何解决方法来获得这样的子过程调用在wsgi下正常工作?我避免使用PyPDF来组合pdf文件,因为我必须结合足够数量的文件(几百个),它的内存不足(PyPDF需要保持每个源的pdf文件在内存中打开并结合)。 p>

我在最近的Ubuntu,pythons 2.6和2.7中这样做。

解决方案

p>尝试使用绝对文件系统路径输入和输出文件。 Apache下的当前工作目录与运行服务器不一样,可以是任何东西。








pdftk程序是一个Java程序,它依赖于能够生成/接收SIGPWR信号来触发垃圾回收或执行其他操作。问题是在Apache / mod_wsgi的守护进程模式下,信号在请求处理程序线程中被阻塞,以确保它们只被主线程接收到寻找进程关闭触发事件。当您正在分配进程运行pdftk时,遗憾的是从请求处理程序线程继承了阻塞的sigmask。这样做的结果是它阻碍了Java垃圾收集过程的操作,导致pdftk以奇怪的方式失败。



唯一的解决方案是使用Celery和将前端提交作业到Celeryd的Celery队列,然后fork并执行pdftk。因为这是从与Apache不同的进程完成的,所以你不会有这个问题。



有关更多gory的详细信息,请查看Google for mod_wsgi和pdftk,特别是在Google网上论坛。



http://groups.google.com/group/modwsgi/search?group=modwsgi&q=pdftk&qt_g=Search+this+group


I need to launch a pdftk process while serving a web request in Django, and wait for it to finish. My current pdftk code looks like this:

proc = subprocess.Popen(["/usr/bin/pdftk", 
                         "/tmp/infile1.pdf", 
                         "/tmp/infile2.pdf", 
                         "cat", "output", "/tmp/outfile.pdf"])    
proc.communicate()

This works fine, as long as I'm executing under the dev server (running as user www-data). But as soon as I switch to mod_wsgi, changing nothing else, the code hangs at proc.communicate(), and "outfile.pdf" is left as an open file handle of zero length.

I've tried a several variants of the subprocess invocation (as well as plain old os.system) -- setting stdin/stdout/stderr to PIPE or to various file handles changes nothing. Using "shell=True" prevents proc.communicate() from hanging, but then pdftk fails to create the output file, both under the devserver or mod_wsgi. This discussion seems to indicate there might be some deeper voodoo going on with OS signals and pdftk that I don't understand.

Are there any workarounds to get a subprocess call like this to work properly under wsgi? I'm avoiding using PyPDF to combine pdf files, because I have to combine large enough numbers of files (several hundred) that it runs out of memory (PyPDF needs to keep every source pdf file open in memory while combining them).

I'm doing this under recent Ubuntu, pythons 2.6 and 2.7.

解决方案

Try with absolute file system paths to input and output files. The current working directory under Apache will not be same directory as run server and could be anything.


Second attempt after eliminating the obvious.

The pdftk program is a Java program which is relying on being able to generate/receive SIGPWR signal to trigger garbage collection or perform other actions. Problem is that under Apache/mod_wsgi in daemon mode, signals are blocked within the request handler threads to ensure that they are only received by the main thread looking for process shutdown trigger events. When you are forking the process to run pdftk, it is unfortunately inheriting the blocked sigmask from the request handler thread. The consequence of this is that it impedes the operation of the Java garbage collection process and causes pdftk to fail in strange ways.

The only solution for this is to use Celery and have the front end submit a job to the Celery queue for celeryd to then fork and execute pdftk. Because this is then done from a process created distinct from Apache, you will not have this issue.

For more gory details Google for mod_wsgi and pdftk, in particular in Google Groups.

http://groups.google.com/group/modwsgi/search?group=modwsgi&q=pdftk&qt_g=Search+this+group

这篇关于在wsgi中如何启动pdftk子进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆