很快得到文件夹的总大小 [英] very quickly getting total size of folder
问题描述
我想使用python快速找到任何文件夹的总大小.
导入操作系统from os.path import join, getsize, isfile, isdir, splitextdef GetFolderSize(路径):总大小 = 0对于 os.walk(path) 中的项目:对于项目 [2] 中的文件:尝试:TotalSize = TotalSize + getsize(join(item[0], file))除了:打印(文件错误:"+加入(项目[0],文件))返回总大小打印(浮动(GetFolderSize(C:\"))/1024/1024/1024)
这是我为获取文件夹总大小而编写的简单脚本,大约需要 60 秒(+-5 秒).通过使用多处理,我在四核机器上将时间缩短到 23 秒.
使用 Windows 文件资源管理器只需要大约 3 秒(右键单击 -> 属性自己查看).那么是否有一种更快的方法可以找到接近 Windows 可以做到的速度的文件夹的总大小?
Windows 7,python 2.6(确实进行了搜索,但大多数时候人们使用的方法与我自己的方法非常相似)提前致谢.
你处于劣势.
Windows 资源管理器几乎肯定会使用 FindFirstFile代码>
/FindNextFile
一次遍历目录结构和收集大小信息(通过lpFindFileData
),这基本上是每个文件的单个系统调用.>
不幸的是,在这种情况下,Python 不是您的朋友.因此,
os.walk
首先调用os.listdir
(内部调用FindFirstFile
/FindNextFile代码>)
- 从此时起进行的任何其他系统调用只会使您的速度比 Windows 资源管理器慢
os.walk
然后为os.listdir
返回的每个文件调用isdir
(内部调用GetFileAttributesEx
-- 或, 在 Win2k 之前,一个GetFileAttributes
+FindFirstFile
组合) 来重新确定是否递归os.walk
和os.listdir
将执行额外的内存分配、字符串和数组操作等来填充它们的返回值- 你然后为
os.walk
返回的每个文件调用getsize
(它再次调用GetFileAttributesEx
)
每个文件的系统调用数是 Windows 资源管理器的 3 倍,此外还有内存分配和操作开销.
您可以使用 Anurag 的解决方案,也可以尝试直接递归调用 FindFirstFile
/FindNextFile
(这应该与 cygwin
或其他 win32 端口 du -s some_directory
.)
参考os.py代码>
os.walk
的实现,posixmodule.c
用于 listdir
和 win32_stat
的实现(由 isdir
和 getsize
.)
请注意,Python 的 os.walk
在所有平台上都不是最佳的(Windows 和 *nices),直到并包括 Python3.1.在 Windows 和 *nices 上,os.walk
可以在不调用 isdir
的情况下实现一次遍历,因为 FindFirst
/FindNext
(Windows) 和 opendir
/readdir
(*nix) 已经通过 lpFindFileData->dwFileAttributes
(Windows) 和 dirent::d_type
(*nix).
也许与直觉相反,在大多数现代配置(例如 Win7 和 NTFS,甚至一些 SMB 实现)上,GetFileAttributesEx
的速度是 FindFirstFile
的 两倍单个文件(可能比使用 FindNextFile
遍历目录还要慢.)
更新:Python 3.5 包含新的 PEP 471 os.scandir()
函数通过返回文件属性和文件名来解决这个问题.这个新函数用于加速内置的 os.walk()
(在 Windows 和 Linux 上).您可以使用 PyPI 上的scandir 模块 为旧 Python 版本(包括 2.x)获取此行为.
I want to quickly find the total size of any folder using python.
import os
from os.path import join, getsize, isfile, isdir, splitext
def GetFolderSize(path):
TotalSize = 0
for item in os.walk(path):
for file in item[2]:
try:
TotalSize = TotalSize + getsize(join(item[0], file))
except:
print("error with file: " + join(item[0], file))
return TotalSize
print(float(GetFolderSize("C:\")) /1024 /1024 /1024)
That's the simple script I wrote to get the total size of the folder, it took around 60 seconds (+-5 seconds). By using multiprocessing I got it down to 23 seconds on a quad core machine.
Using the Windows file explorer it takes only ~3 seconds (Right click-> properties to see for yourself). So is there a faster way of finding the total size of a folder close to the speed that windows can do it?
Windows 7, python 2.6 (Did searches but most of the time people used a very similar method to my own) Thanks in advance.
You are at a disadvantage.
Windows Explorer almost certainly uses FindFirstFile
/FindNextFile
to both traverse the directory structure and collect size information (through lpFindFileData
) in one pass, making what is essentially a single system call per file.
Python is unfortunately not your friend in this case. Thus,
os.walk
first callsos.listdir
(which internally callsFindFirstFile
/FindNextFile
)- any additional system calls made from this point onward can only make you slower than Windows Explorer
os.walk
then callsisdir
for each file returned byos.listdir
(which internally callsGetFileAttributesEx
-- or, prior to Win2k, aGetFileAttributes
+FindFirstFile
combo) to redetermine whether to recurse or notos.walk
andos.listdir
will perform additional memory allocation, string and array operations etc. to fill out their return value- you then call
getsize
for each file returned byos.walk
(which again callsGetFileAttributesEx
)
That is 3x more system calls per file than Windows Explorer, plus memory allocation and manipulation overhead.
You can either use Anurag's solution, or try to call FindFirstFile
/FindNextFile
directly and recursively (which should be comparable to the performance of a cygwin
or other win32 port du -s some_directory
.)
Refer to os.py
for the implementation of os.walk
, posixmodule.c
for the implementation of listdir
and win32_stat
(invoked by both isdir
and getsize
.)
Note that Python's os.walk
is suboptimal on all platforms (Windows and *nices), up to and including Python3.1. On both Windows and *nices os.walk
could achieve traversal in a single pass without calling isdir
since both FindFirst
/FindNext
(Windows) and opendir
/readdir
(*nix) already return file type via lpFindFileData->dwFileAttributes
(Windows) and dirent::d_type
(*nix).
Perhaps counterintuitively, on most modern configurations (e.g. Win7 and NTFS, and even some SMB implementations) GetFileAttributesEx
is twice as slow as FindFirstFile
of a single file (possibly even slower than iterating over a directory with FindNextFile
.)
Update: Python 3.5 includes the new PEP 471 os.scandir()
function that solves this problem by returning file attributes along with the filename. This new function is used to speed up the built-in os.walk()
(on both Windows and Linux). You can use the scandir module on PyPI to get this behavior for older Python versions, including 2.x.
这篇关于很快得到文件夹的总大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!