Python:不断检查添加到列表中的文件的大小,停止大小,压缩列表,继续 [英] Python: Continuously check size of files being added to list, stop at size, zip list, continue

查看:40
本文介绍了Python:不断检查添加到列表中的文件的大小,停止大小,压缩列表,继续的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图遍历一个目录,检查每个文件的大小,并将文件添加到列表中,直到它们达到特定大小 (2040 MB).那时,我想将列表放入 zip 存档,然后继续循环遍历目录中的下一组文件并继续执行相同的操作.另一个限制是同名但扩展名不同的文件需要一起添加到 zip 中,并且不能分开.我希望这是有道理的.

I am trying to loop through a directory, check the size of each file, and add the files to a list until they reach a certain size (2040 MB). At that point, I want to put the list into a zip archive, and then continue looping through the next set of files in the directory and continue to do the same thing. The other constraint is that files with the same name but different extension need to be added together into the zip, and can't be separated. I hope that makes sense.

我遇到的问题是我的代码基本上忽略了我添加的大小限制,并且只是压缩目录中的所有文件.

The issue I am having is that my code basically ignores the size constraint that I have added, and just zips up all the files in the directory anyway.

我怀疑存在一些逻辑问题,但我没有看到.任何帮助,将不胜感激.这是我的代码:

I suspect there is some logic issue, but I am failing to see it. Any help would be appreciated. Here is my code:

import os,os.path, zipfile
from time import *

#### Function to create zip file ####
# Add the files from the list to the zip archive
def zipFunction(zipList):

    # Specify zip archive output location and file name
    zipName = "D:\Documents\ziptest1.zip"

    # Create the zip file object
    zipA = zipfile.ZipFile(zipName, "w", allowZip64=True)  

    # Go through the list and add files to the zip archive
    for w in zipList:

        # Create the arcname parameter for the .write method. Otherwise  the zip file
        # mirrors the directory structure within the zip archive (annoying).
        arcname = w[len(root)+1:]

        # Write the files to a zip
        zipA.write(w, arcname, zipfile.ZIP_DEFLATED)

    # Close the zip process
    zipA.close()
    return       
#################################################
#################################################

sTime = clock()

# Set the size counter
totalSize = 0

# Create an empty list for adding files to count MB and make zip file
zipList = []

tifList = []

xmlList = []

# Specify the directory to look at
searchDirectory = "Y:\test"

# Create a counter to check number of files
count = 0

# Set the root, directory, and file name
for root,direc,f in os.walk(searchDirectory):

        #Go through the files in directory
        for name in f:
            # Set the os.path file root and name
            full = os.path.join(root,name)

            # Split the file name from the file extension
            n, ext = os.path.splitext(name)

            # Get size of each file in directory, size is obtained in BYTES
            fileSize = os.path.getsize(full)

            # Add up the total sizes for all the files in the directory
            totalSize += fileSize

            # Convert from bytes to megabytes
                # 1 kilobyte = 1,024 bytes
                # 1 megabyte = 1,048,576 bytes
                # 1 gigabyte = 1,073,741,824 bytes
            megabytes = float(totalSize)/float(1048576)

            if ext == ".tif":  # should be everything that is not equal to XML (could be TIF, PDF, etc.) need to fix this later
                tifList.append(n)#, fileSize/1048576])
                tifSorted = sorted(tifList)
            elif ext == ".xml":
                xmlList.append(n)#, fileSize/1048576])
                xmlSorted = sorted(xmlList)

            if full.endswith(".xml") or full.endswith(".tif"):
                zipList.append(full)

            count +=1

            if megabytes == 2040 and len(tifList) == len(xmlList):
                zipFunction(zipList)
            else:
                continue

eTime = clock()
elapsedTime = eTime - sTime
print "Run time is %s seconds"%(elapsedTime)

我唯一能想到的就是我的变量 megabytes==2040 从来没有一个实例.否则我无法弄清楚如何让代码停止在那个点上;我想知道使用范围是否有效?我也试过:

The only thing I can think of is that there is never an instance where my variable megabytes==2040 exactly. I can't figure out how to make the code stop at that point otherwise though; I wonder if using a range would work? I also tried:

    if megabytes < 2040:
       zipList.append(full) 
       continue 
    elif megabytes == 2040:
       zipFunction(zipList)

推荐答案

您的主要问题是在归档当前文件列表时需要重新设置文件大小.例如

Your main problem is that you need to reset your file size tally when you archive the current list of files. Eg

if megabytes >= 2040:
    zipFunction(zipList)
    totalSize = 0

顺便说一句,你不需要

else:
    continue 

那里,因为它是循环的结尾.

there, since it's the end of the loop.

至于您需要将具有相同主文件名但不同扩展名的文件保存在一起的限制,唯一的万无一失的方法是在处理文件之前对文件名进行排序.

As for the constraint that you need to keep files together that have the same main file name but different extensions, the only fool-proof way to do that is to sort the file names before processing them.

如果您想保证每个存档中的总文件大小低于限制,您需要在将文件添加到列表之前测试大小.例如,

If you want to guarantee that the total file size in each archive is under the limit you need to test the size before you add the file(s) to the list. Eg,

if (totalSize + fileSize) // 1048576 > 2040:
    zipFunction(zipList)
    totalsize = 0

totalSize += fileSize

该逻辑需要稍微修改以处理将一组文件保存在一起:您需要将组中每个文件的文件大小一起添加到小计中,然后查看是否添加该小计totalSize 超过限制.

That logic will need to be modified slightly to handle keeping a group of files together: you'll need to add the filesizes of each file in the group together into a sub-total, and then see if adding that sub-total to totalSize takes it over the limit.

这篇关于Python:不断检查添加到列表中的文件的大小,停止大小,压缩列表,继续的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆