如何更改进度条的位置–多处理 [英] How to change position of progress bar – multiprocessing

查看:89
本文介绍了如何更改进度条的位置–多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我是Python的新手.这个问题无关紧要,但是我不得不提.

First of, I am new to Python. It's irrelevant to the question, but I have to mention it.

我正在创建一个搜寻器作为我的第一个项目,以了解Python中的工作方式,但是到目前为止,这是我的主要问题...在使用requestspathos.multiprocessing.

I am creating an crawler as my first project, to understand how things work in Python, but so far this is my major issue... Understanding "how to get multiple progress bars" in Terminal while using requests and pathos.multiprocessing.

我设法遍历了所有内容,我只想拥有更漂亮的输出,所以我决定添加进度条.我正在使用tqdm,因为我喜欢它的外观,而且似乎最容易实现.

I managed to go through everything, I just want to have prettier output, so I decide to add progressbars. I am using tqdm as I like the looks and it seems easiest to implement.

这是我的方法,目的是下载文件.

Here's my method which purpose is to download the file.

def download_lesson(self, lesson_data):
    if not 'file' in lesson_data:
        return print('=> Skipping... File {file_name} already exists.'.format(file_name=lesson_data['title']))

    response = requests.get(lesson_data['video_source'], stream=True)
    chunk_size = 1024

    with open(lesson_data['file'], 'wb') as file:
        progress = tqdm(
            total=int(response.headers['Content-Length']),
            unit='B',
            unit_scale=True
        )

        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:
                progress.update(len(chunk))
                file.write(chunk)

        progress.close()
        print('=> Success... File "{file_name}" has been downloaded.'.format(file_name=lesson_data['title']))

我通过Processing运行该方法:

# c = instance of my crawling class
# cs = returns the `lesson_data` for `download_lesson` method

p = Pool(1)
p.map(c.download_lesson, cs)

所以一切正常,就像我在Pool中使用processes=1一样.但是,当我运行多个进程时,假设processes=3,事情开始变得怪异,而我却又一次获得了多个进展.

So everything works great, as I am using processes=1 in the Pool. But when I run multiple processes, let's say processes=3 then things start to get weird and I get multiple progresses one inside of another.

我在 tqdm文档中找到了position的参数.明确说明了在这种情况下我需要做什么的目的.

I've found in tqdm documentation that there is parameter for position. Which clearly states the purpose of what I do need in this case.

position:int,可选 指定打印此栏的行偏移量(从0开始).如果未指定,则为自动.一次管理多个条很有用(例如,从线程).

position : int, optional Specify the line offset to print this bar (starting from 0) Automatic if unspecified. Useful to manage multiple bars at once (eg, from threads).

但是,我不知道如何设置该位置.我尝试了一些奇怪的事情,例如添加了一个变量,该变量可以自身增加1,但是每当方法download_lesson运行时,它似乎都不会进行任何增加.始终为0,因此位置始终为0.

However, I have no clues how to set that position. I tried some weird stuff, such as adding an variable that's suppoused to increment itself by one, but whenever the method download_lesson is being ran, it doesn't seem to do any incrementing. Always 0 so position is always 0.

在这种情况下,我似乎不太了解...欢迎任何提示,提示或完整的解决方案.谢谢!

So seems like I don't understand much in this case... Any tips, hints or complete solutions are welcome. Thank you!

更新#1:

我发现我也可以将另一个参数传递给映射,因此我传递的是正在设置的进程数量. (例如,processs = 2)

I found out that I can pass another argument to the map as well, so I am passing amount of processes that were being set. (e.g. processes=2)

p = Pool(config['threads'])
p.map(c.download_lesson, cs, range(config['threads']))

因此,在我的方法中,我尝试打印出该参数,并且确实得到了01,因为在示例中正在运行2进程.

So, in my method I tried to print out that argument and indeed I do get 0 and 1, as I am running 2 processes in the example.

但这似乎根本没有任何作用...

But this does not seem to do anything at all...

progress = tqdm(
    total=int(response.headers['Content-Length']),
    unit='B',
    unit_scale=True,
    position=progress_position
)

我仍然遇到进度条重叠的问题.当我手动将位置设置为(例如10)时,它会在Terminal中跳跃,因此位置确实会移动,但仍然与ofc重叠,因为现在两者都设置为10.但是当动态设置时,它似乎也不起作用.我不明白我的问题在这里...就像当地图两次运行此方法时,它仍然为两个进度条提供最新的设置位置.我到底在做什么错?

I still get the same issue of overlapping progress bars. When I manually set position to (e.g. 10) it jumps in Terminal so position does move, still with overlapping ofc because now both are set to 10. But when set dynamically, it does not seem to work either. I don't understand what's my issue here... It's like when map run this method two times, it still gives the latest set position to both progress bars. What the heck am I doing wrong?

推荐答案

好,首先,我要感谢@MikeMcKerns的评论... 因此,我的脚本有很多更改,因为我想使用不同的方法,但最终归结为这些重要更改.

Ok, first of I'd like to thank @MikeMcKerns for his comment... So there are lots of changes to my script, because I wanted different approach, but in the end it comes down to these important changes.

我的init.py现在看起来更干净了...

My init.py now looks that much cleaner...

from scraper.Crawl import Crawl

if __name__ == '__main__':
    Crawl()

对于download_lesson,我在scraper.Crawl类内部的方法现在看起来像这样...

My method inside of scraper.Crawl class, for download_lesson, now looks like this...

def download_lesson(self, lesson):

    response = requests.get(lesson['link'], stream=True)
    chunk_size = 1024

    progress = tqdm(
        total=int(response.headers['Content-Length']),
        unit='B',
        unit_scale=True
    )

    with open(lesson['file'], 'wb') as file:
        for chunk in response.iter_content(chunk_size=chunk_size):
            progress.update(len(chunk))
            file.write(chunk)

    progress.close()

最后,我有一个专门用于多处理的方法,看起来像这样:

And finally, I have a method dedicated to multiprocessing, which looks like this:

def begin_processing(self):
    pool = ThreadPool(nodes=Helper.config('threads'))

    for course in self.course_data:
        pool.map(self.download_lesson, course['lessons'])
        print(
            'Course "{course_title}" has been downloaded, with total of {lessons_amount} lessons.'.format(
                course_title=course['title'],
                lessons_amount=len(course['lessons'])
            )
        )

如您所知,我对班级做了一些重大更改,但最重要的是,我必须将此位添加到我的init.py

So as you can tell, I made some major changes to my class, but most importantly I had to add this bit to my init.py

if __name__ == '__main__':

其次,我不得不使用@MikeMcKerns建议我看的东西:

And secondly, I had to use what @MikeMcKerns suggested me to take a look at:

from pathos.threading import ThreadPool

因此,有了这些更改,我终于可以按需运行所有东西.这是一个快速截图.

So with those changes, I finally got everything working as I needed. Here's a quick screenshot.

即使如此,我仍然不知道为什么pathos.multiprocessing使tqdm的进展非常困难,由于迈克的建议,我设法解决了问题.谢谢!

Even tho, I still have no clues why pathos.multiprocessing is making tqdm progress very buggy, I managed to solve my problem thanks to the suggestion of Mike. Thank you!

这篇关于如何更改进度条的位置–多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆