使用Python和线程下载文件 [英] File downloading using python with threads

查看:74
本文介绍了使用Python和线程下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个python脚本,该脚本接受指向远程文件的路径和n个线程.文件的大小将除以线程数,当每个线程完成时,我希望它们将获取的数据附加到本地文件中.

I'm creating a python script which accepts a path to a remote file and an n number of threads. The file's size will be divided by the number of threads, when each thread completes I want them to append the fetch data to a local file.

我该如何管理它,以便将生成的线程的顺序附加到本地文件中,以便字节不被打乱?

How do I manage it so that the order in which the threads where generated will append to the local file in order so that the bytes don't get scrambled?

此外,如果我要同时下载多个文件怎么办?

Also, what if I'm to download several files simultaneously?

推荐答案

您可以使用锁& c协调作品,但我建议改为使用

You could coordinate the works with locks &c, but I recommend instead using Queue -- usually the best way to coordinate multi-threading (and multi-processing) in Python.

我将让主线程生成您认为合适的尽可能多的工作线程(您可能希望在性能之间进行校准,并通过试验在远程服务器上进行加载);每个工作线程都在相同的全局Queue.Queue实例上等待,例如,将其称为workQ,以表示工作请求"(wr = workQ.get()会正确执行它-每个工作请求都是由单个工作线程获得的,不必大惊小怪,没关系.

I would have the main thread spawn as many worker threads as you think appropriate (you may want to calibrate between performance, and load on the remote server, by experimenting); every worker thread waits at the same global Queue.Queue instance, call it workQ for example, for "work requests" (wr = workQ.get() will do it properly -- each work request is obtained by a single worker thread, no fuss, no muss).

在这种情况下,工作请求"可以简单地是一个三元组(具有三项的元组):远程文件的标识(URL或其他内容),请求从中获取数据的偏移量,要传输的字节数从中获取信息(请注意,这对于获取一个或多个文件同样有效).

A "work request" can in this case simply be a triple (tuple with three items): identification of the remote file (URL or whatever), offset from which it is requested to get data from it, number of bytes to get from it (note that this works just as well for one or multiple files ot fetch).

主线程将所有工作请求推送到workQ(每个请求仅workQ.put((url, from, numbytes))),并等待结果到达另一个Queue实例,将其称为resultQ(每个结果也将是三倍) :文件标识符,起始偏移量,该文件在该偏移量处的结果的字节字符串).

The main thread pushes all work requests to the workQ (just workQ.put((url, from, numbytes)) for each request) and waits for results to come to another Queue instance, call it resultQ (each result will also be a triple: identifier of the file, starting offset, string of bytes that are the results from that file at that offset).

当每个工作线程都满足其正在执行的请求时,它将结果放入resultQ中,然后返回以获取另一个工作请求(或等待一个).同时,主线程(或在需要时单独的专用写入线程",即,如果主线程还有其他工作要做,例如在GUI上)从resultQ获取结果并执行所需的openwrite操作将数据放置在正确的位置.

As each working thread satisfies the request it's doing, it puts the results into resultQ and goes back to fetch another work request (or wait for one). Meanwhile the main thread (or a separate dedicated "writing thread" if needed -- i.e. if the main thread has other work to do, for example on the GUI) gets results from resultQ and performs the needed open, seek, and write operations to place the data at the right spot.

有几种方法可以终止操作:例如,一个特殊的工作请求可能正在要求接收该操作的线程终止-主线程将workQ放在与工作线程一样多的线程上,之后所有实际的工作请求,然后在接收并写入所有数据时加入所有工作线程(存在许多其他选择,例如直接加入队列,具有工作线程守护程序,以便它们在主线程终止时消失)来).

There are several ways to terminate the operation: for example, a special work request may be asking the thread receiving it to terminate -- the main thread puts on workQ just as many of those as there are working threads, after all the actual work requests, then joins all the worker threads when all data have been received and written (many alternatives exist, such as joining the queue directly, having the worker threads daemonic so they just go away when the main thread terminates, and so forth).

这篇关于使用Python和线程下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆