有没有办法在不同线程上运行cpython而不会有崩溃的风险? [英] Is there a way to run cpython on a diffident thread without risking a crash?

查看:136
本文介绍了有没有办法在不同线程上运行cpython而不会有崩溃的风险?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序在INFINITE LOOP中运行许多urllib请求,这使我的程序真的很慢,因此我尝试将它们作为线程。 Urllib在套接字模块的深处使用cpython,因此正在创建的线程只是累加而无所事事,因为python的GIL阻止了两个cpython命令同时在不同线程中执行。我在Windows XP和Python 2.5上运行,因此无法使用多进程模块。我尝试查看子过程模块,以了解是否可以通过某种方式在子进程中执行python代码,但没有任何方法。如果有人可以通过多进程中的函数创建python子进程,那将很棒。



此外,我宁愿不下载外部模块,但我愿意。



编辑:这是当前程序中一些代码的示例。

  url = http://example.com/upload_image.php?username=Test&password=test 
url = urllib.urlopen(url,data = urllib.urlencode ({ Image:raw_image_data}))。read()
if url.strip()。replace( \n,)!=:
打印url

我做了一个测试,结果显示urllib2的urlopen和Request对象仍然一样慢。我创建了自己的自定义timeit(例如模块),上面的过程耗时约0.5-2秒,这对我的程序造成了影响。

解决方案

< blockquote>

Urllib在套接字模块的深处使用cpython,因此正在创建的线程
只是累加而没有执行任何操作,因为python的GIL
阻止了两个cpython命令被执行在不同的
线程中同时执行。


错误。虽然这是一个普遍的误解。 CPython可以并确实为IO操作发布GIL(请查看Py_BEGIN_ALLOW_THREADS /Modules/socketmodule.c rel = nofollow> socketmodule.c )。当一个线程等待IO完成时,其他线程可以完成一些工作。如果 urllib 调用是脚本中的瓶颈,则线程可能是可接受的解决方案之一。


我正在使用Python 2.5运行Windows XP,所以不能使用多进程模块。


您可以安装Python 2.6或更高版本,或者必须使用Python 2.5;您可以分别安装多重处理


我创建了自己的自定义timeit,例如模块,上面的过程耗时约0.5-2秒,这对于我的程序来说很可怕。


urllib2.urlopen('http://example.com ...).read()的性能主要取决于外部因素,例如DNS,网络延迟/带宽,example.com服务器本身的性能。



这里是一个示例脚本,它同时使用了线程 urllib2

  import urllib2 
从队列导入从线程导入队列
进入线程

def检查(队列):
检查/ n url。
打开程序= urllib2.build_opener()#如果在其他线程中使用install_opener
在iter中使用n(queue.get,None):
尝试:
data = opener.open('http:// localhost:8888 /%d '%( n,))。read()
,但IOError除外,e:
print(错误/%d原因%s%(n,e))
其他:
在此处检查数据

def main():
nurls,nthreads = 10000,10

#生成线程
queue = Queue()
线程= [在xrange(nthreads)中_的线程(target(检查=检查,args =(队列,)))]
在线程中的t:
t.daemon = True#如果程序退出,则死亡
t.start()

#为xrange(nurls)中的n提供一些工作
:queue.put_nowait(n)
#表示结束
for _ in thread:queue.put(None)
#等待完成
for t在线程中:t.join()

如果__name __ == __ main__:
main()

要将其转换为多处理脚本,只需使用不同的导入,程序便会使用多个进程:

 来自多处理导入队列
来自多处理导入作为线程

#脚本的其余部分是相同的


I have a program that runs lots of urllib requests IN AN INFINITE LOOP, which makes my program really slow, so I tried putting them as threads. Urllib uses cpython deep down in the socket module, so the threads that are being created just add up and do nothing because python's GIL prevents a two cpython commands from being executed in diffident threads at the same time. I am running Windows XP with Python 2.5, so I can't use the multiprocess module. I tried looking at the subproccess module to see if there was a way to execute python code in a subprocess somehow, but nothing. If anyone has a way that I can create a python subprocess through a function like in the multiprocess, that would be great.

Also, I would rather not download an external module, but I am willing to.

EDIT: Here is a sample of some code in my current program.

    url = "http://example.com/upload_image.php?username=Test&password=test"
    url = urllib.urlopen(url, data=urllib.urlencode({"Image": raw_image_data})).read()
    if url.strip().replace("\n", "") != "":
        print url

I did a test and it turns out that urllib2's urlopen with the Request object and without is still as slow or slower. I created my own custom timeit like module and the above takes around 0.5-2 seconds, which is horrible for what my program does.

解决方案

Urllib uses cpython deep down in the socket module, so the threads that are being created just add up and do nothing because python's GIL prevents a two cpython commands from being executed in diffident threads at the same time.

Wrong. Though It is a common misconception. CPython can and do release GIL for IO-operations (look at all Py_BEGIN_ALLOW_THREADS in the socketmodule.c). While one thread waits for IO to complete other threads can do some work. If urllib calls are the bottleneck in your script then threads may be one of the acceptable solutions.

I am running Windows XP with Python 2.5, so I can't use the multiprocess module.

You could install Python 2.6 or newer or if you must use Python 2.5; you could install multiprocessing separately.

I created my own custom timeit like module and the above takes around 0.5-2 seconds, which is horrible for what my program does.

The performance of urllib2.urlopen('http://example.com...).read() depends mostly on outside factors such as DNS, network latency/bandwidth, performance of example.com server itself.

Here's an example script which uses both threading and urllib2:

import urllib2
from Queue import Queue
from threading import Thread

def check(queue):
    """Check /n url."""
    opener = urllib2.build_opener() # if you use install_opener in other threads
    for n in iter(queue.get, None):
        try:
            data = opener.open('http://localhost:8888/%d' % (n,)).read()
        except IOError, e:
            print("error /%d reason %s" % (n, e))
        else:
            "check data here"

def main():
    nurls, nthreads = 10000, 10

    # spawn threads
    queue = Queue()
    threads = [Thread(target=check, args=(queue,)) for _ in xrange(nthreads)]
    for t in threads:
        t.daemon = True # die if program exits
        t.start()

    # provide some work
    for n in xrange(nurls): queue.put_nowait(n)
    # signal the end
    for _ in threads: queue.put(None)
    # wait for completion
    for t in threads: t.join()

if __name__=="__main__":
   main()

To convert it to a multiprocessing script just use different imports and your program will use multiple processes:

from multiprocessing import Queue
from multiprocessing import Process as Thread

# the rest of the script is the same

这篇关于有没有办法在不同线程上运行cpython而不会有崩溃的风险?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆