如何制作进度条,以便从大型xlsx文件加载pandas DataFrame? [英] How do I make a progress bar for loading pandas DataFrame from a large xlsx file?

查看:621
本文介绍了如何制作进度条,以便从大型xlsx文件加载pandas DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 https://pypi.org/project/tqdm/:

import pandas as pd
import numpy as np
from tqdm import tqdm

df = pd.DataFrame(np.random.randint(0, 100, (100000, 6)))
tqdm.pandas(desc="my bar!")p`
df.progress_apply(lambda x: x**2)

我接受了这段代码并对其进行了编辑,以便从load_excel创建一个DataFrame而不是使用随机数:

I took this code and edited it so that I create a DataFrame from load_excel rather than using random numbers:

import pandas as pd
from tqdm import tqdm
import numpy as np

filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x**2)

这给了我一个错误,所以我将df.progress_apply更改为:

This gave me an error, so I changed df.progress_apply to this:

df.progress_apply(lambda x: x)

这是最终代码:

import pandas as pd
from tqdm import tqdm
import numpy as np

filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x)

这会产生一个进度条,但是实际上并没有显示任何进度,而是加载了进度条,并且当​​操作完成时,它会跳到100%,达不到目的.

This results in a progress bar, but it doesn't actually show any progress, rather it loads the bar, and when the operation is done it jumps to 100%, defeating the purpose.

我的问题是:如何使进度条起作用?
progress_apply内部的功能实际上是做什么的?
有没有更好的方法?也许是tqdm的替代品?

My question is this: How do I make this progress bar work?
What does the function inside of progress_apply actually do?
Is there a better approach? Maybe an alternative to tqdm?

任何帮助将不胜感激.

Any help is greatly appreciated.

推荐答案

将不起作用. pd.read_excel阻塞,直到读取文件为止,并且无法从此函数获取有关其在执行期间的进度的信息.

Will not work. pd.read_excel blocks until the file is read, and there is no way to get information from this function about its progress during execution.

它适用于可以逐块执行的读取操作,例如

It would work for read operations which you can do chunk wise, like

chunks = []
for chunk in pd.read_csv(..., chunksize=1000):
    update_progressbar()
    chunks.append(chunk)

但是据我了解,tqdm还需要提前获取块的数量,因此,要获得适当的进度报告,您需要先阅读完整的文件....

But as far as I understand tqdm also needs the number of chunks in advance, so for a propper progress report you would need to read the full file first....

这篇关于如何制作进度条,以便从大型xlsx文件加载pandas DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆