如何在Python网络机器人中有效地实现多线程/多处理? [英] How can I efficiently implement multithreading/multiprocessing in a Python web bot?

查看:78
本文介绍了如何在Python网络机器人中有效地实现多线程/多处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个用python编写的网络机器人,它通过POST请求将数据发送到网站.数据从文本文件中逐行提取并传递到数组中.目前,我正在通过一个简单的for循环测试数组中的每个元素.如何有效地实现多线程以更快地遍历数据.假设该文本文件很大.将线程附加到每个请求是否明智?您认为最好的方法是什么?

Let's say I have a web bot written in python that sends data via POST request to a web site. The data is pulled from a text file line by line and passed into an array. Currently, I'm testing each element in the array through a simple for-loop. How can I effectively implement multi-threading to iterate through the data quicker. Let's say the text file is fairly large. Would attaching a thread to each request be smart? What do you think the best approach to this would be?

with open("c:\file.txt") as file:
     dataArr = file.read().splitlines()

dataLen = len(open("c:\file.txt").readlines())-1

def test(data):
     #This next part is pseudo code
     result = testData('www.example.com', data)
     if result == 'whatever':
          print 'success'

for i in range(0, dataLen):
    test(dataArr[i])

我正在考虑一些类似的方法,但是我认为这会导致问题,具体取决于文本文件的大小.我知道有一种软件可以使最终用户在处理大量数据时指定线程数量.我不完全确定它是如何工作的,但这是我想要实现的.

I was thinking of something along the lines of this, but I feel it would cause issues depending on the size of the text file. I know there is software that exists which allows the end-user to specify the amount of the threads when working with large amounts of data. I'm not entirely sure of how that works, but that's something I'd like to implement.

import threading

with open("c:\file.txt") as file:
     dataArr = file.read().splitlines()

dataLen = len(open("c:\file.txt").readlines())-1

def test(data):
     #This next part is pseudo code
     result = testData('www.example.com', data)
     if result == 'whatever':
          print 'success'

jobs = []

for x in range(0, dataLen):
     thread = threading.Thread(target=test, args=(dataArr[x]))
     jobs.append(thread)

for j in jobs:
    j.start()
for j in jobs:
    j.join()

推荐答案

这听起来像multiprocessing.Pool

请参阅此处: https://docs.python.org/2/library /multiprocessing.html#introduction

from multiprocessing import Pool

def test(num):
    if num%2 == 0:
        return True
    else:
        return False

if __name__ == "__main__":
    list_of_datas_to_test = [0, 1, 2, 3, 4, 5, 6, 7, 8]

    p = Pool(4)  # create 4 processes to do our work
    print(p.map(test, list_of_datas_to_test))  # distribute our work

输出如下:

[True, False, True, False, True, False, True, False, True, False]

这篇关于如何在Python网络机器人中有效地实现多线程/多处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆