Python Urllib UrlOpen 读取 [英] Python Urllib UrlOpen Read

查看：39 发布时间：2021/6/4 20:18:14 python multithreading screen-scraping urllib

本文介绍了Python Urllib UrlOpen 读取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我正在使用 Python 中的 Urllib2 库从服务器检索 Urls 列表.我注意到获取一页大约需要 5 秒钟，完成我想要收集的所有页面需要很长时间.

Say I am retrieving a list of Urls from a server using Urllib2 library from Python. I noticed that it took about 5 seconds to get one page and it would take a long time to finish all the pages I want to collect.

我在思考这 5 秒.大部分时间都在服务器端消耗，我想知道我是否可以开始使用线程库.在这种情况下说 5 个线程，那么平均时间可能会显着增加.每页可能 1 或 2 秒.(可能会使服务器有点忙).我怎样才能优化线程数，以便获得合法的速度而不是太用力地推动服务器.

I am thinking out of those 5 seconds. Most of the time was consumed on the server side and I am wondering could I just start using the threading library. Say 5 threads in this case, then the average time could be dramatically increased. Maybe 1 or 2 seconds in each page. (might make the server a bit busy). How could I optimize the number of threads so I could get a legit speed and not pushing the server too hard.

谢谢！

更新:我一一增加线程数，并监控抓取 100 个 URL 所花费的总时间(单位:分钟).事实证明，当您将线程数更改为 2 时，总时间显着减少，并随着线程数的增加而不断减少，但线程化带来的改进"变得越来越不明显.(当您构建太多线程时，总时间甚至会显示反弹)我知道这只是我收获的 Web 服务器的一个特定案例，但我决定分享只是为了展示线程的力量，希望有一天能对某人有所帮助.

Updated: I increased the number of threads one by one and monitored the total time (units: minutes) spent to scrape 100 URLs. and it turned out that the total time dramatically decreased when you change the number of threads to 2, and keep decreasing as you increase the number of threads, but the 'improvement' caused by threading become less and less obvious. (the total time even shows a bounce back when you build too many threads) I know this is only a specific case for the web server that I harvest but I decided to share just to show the power of threading and hope would be helpful for somebody one day.

Python Urllib UrlOpen 读取 [英] Python Urllib UrlOpen Read

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Urllib UrlOpen 读取 [英] Python Urllib UrlOpen Read

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭