Python线程stack_size和segfaults [英] Python threads stack_size and segfaults

查看:97
本文介绍了Python线程stack_size和segfaults的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个Web爬虫脚本,最多产生500个线程,每个线程基本上都请求从远程服务器提供某些数据,每个服务器的答复在内容和大小上都与其他服务器不同.

A web crawler script that spawns at most 500 threads and each thread basically requests for certain data served from the remote server, which each server's reply is different in content and size from others.

我将线程的stack_size设置为756K

i'm setting stack_size as 756K's for threads

threading.stack_size(756*1024)

这使我能够拥有所需的足够数量的线程,并可以完成大多数作业和请求.但是,由于某些服务器的响应大于其他服务器,并且当线程获得这种响应时,脚本会因SIGSEGV而消失.

which enables me to have the sufficient number of threads required and complete most of the jobs and requests. But as some servers' responses are bigger than others, and when a thread gets that kind of response, script dies with SIGSEGV.

stack_sizes超过756K,使得不可能同时拥有所需数量的线程.

stack_sizes more than 756K makes it impossible to have the required number of threads at the same time.

关于如何继续使用给定的stack_size而不会崩溃的任何建议? 以及如何获取任何给定线程的当前使用stack_size?

any suggestions on how can i continue with given stack_size without crashes? and how can i get the current used stack_size of any given thread?

推荐答案

为什么在地球上产生500个线程?这似乎是一个可怕的主意!

Why on earth are you spawning 500 threads? That seems like a terrible idea!

完全删除线程,使用事件循环进行爬网.您的程序将更快,更简单且更易于维护.

Remove threading completely, use an event loop to do the crawling. Your program will be faster, simpler, and easier to maintain.

很多等待网络的线程不会使您的程序等待更快.取而代之的是,将所有打开的套接字收集在一个列表中,并运行一个循环,在其中检查其中是否有可用数据.

Lots of threads waiting for network won't make your program wait faster. Instead, collect all open sockets in a list and run a loop where you check if any of them has data available.

我建议使用 Twisted -这是一个事件驱动的网络引擎.它非常灵活,安全,可扩展且非常稳定(无段错误).

I recommend using Twisted - It is an event-driven networking engine. It is very flexile, secure, scalable and very stable (no segfaults).

您还可以查看 Scrapy -这是一个使用Python/Twisted编写的网络抓取和屏幕抓取框架.它仍处于繁重的开发过程中,但也许您可以采取一些想法.

You could also take a look at Scrapy - It is a web crawling and screen scraping framework written in Python/Twisted. It is still under heavy development, but maybe you can take some ideas.

这篇关于Python线程stack_size和segfaults的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆