如何使用python创建具有名称的多个文件夹,并将多个zip提取到每个不同的文件夹? [英] How to create multiple folders with names, and extract multiple zips to each different folder, with python?

查看:293
本文介绍了如何使用python创建具有名称的多个文件夹,并将多个zip提取到每个不同的文件夹?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法为包含不同栅格数据的许多不同的zip文件夹创建许多不同的目录,然后用干净的脚本将所有zip提取到新文件夹中。

I'm having trouble creating many different directories for a number of different zip folders containing different raster data and then extracting all the zips to the new folders in a clean script.

我的代码很长而且很乱,已经完成了任务。我需要具有标记为 NE34_E NE35_E 等的文件夹,然后在这些目录中,我需要以下子文件夹如 N34_24 N34_25 等,将栅格数据提取到其中。我有100多个zip文件需要提取并放在子文件夹中。

I have accomplished my task by my code is very long and messy. I need to have folders that are labeled like NE34_E , NE35_E etc, and then within these directories, I need subfolders such as N34_24 , N34_25 etc. which the raster data will be extracted to. I have over 100 zip files that need to be extracted and placed in subfolders.

在更改了我创建目录的方式之后,这是我的脚本示例。

After making some changes to the way I was making directories this is a sample of my script.

我的文件结构如下:


N\\N36_E\\N36_24
N\\N36_E\\N35_25
... etc.


压缩文件名:


n36_e024_1arc_v3_bil.zip
n36_e025_1arc_v3_bil.zip
n36_e026_1arc_v3_bil.zip
... etc.


创建目录结构的Python代码:

Python code to create the directory structure:

import os

#Create Sub directories for "NE36_"
pathname1 = "NE36_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N36_E\\" + str(pathname1) + str(pathname2)

#Create Sub directories for "NE37_"
pathname1 = "NE37_"
pathname2 = 24
directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)
while pathname2 < 46:
    if not os.path.exists(directory):
        os.makedirs(directory)
    pathname2 += 1
    directory = "D:\\Capstone\\Test\\N37_E\\" + str(pathname1) + str(pathname2)


推荐答案

import glob, os, re, zipfile

# Setup main paths.
zipfile_rootdir = r'D:\Capstone\Zipfiles'
extract_rootdir = r'D:\Capstone\Test'

# Process the zip files.
re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')

for zip_file in glob.iglob(os.path.join(zipfile_rootdir, '*.zip')):

    # Get the parts from the base zip filename using regular expressions.
    part = re.findall(re_pattern, os.path.basename(zip_file))[0]

    # Make all items in part uppercase using a list comprehension.
    part = [item.upper() for item in part]

    # Create a dict of the parts to make useful parts to be used for folder names.
    # E.g. from ['N', '36', 'E', '24']
    folder = {'outer': '{0}{1}_{2}'.format(*part),
              'inner': '{0}{2}{1}_{3}'.format(*part)}

    # Build the extraction path from each part.
    extract_path = os.path.join(extract_rootdir, folder['outer'], folder['inner'])

    # Perform the extract of all files from the zipfile.
    with zipfile.ZipFile(zip_file, 'r') as zip:
        zip.extractall(extract_path)

2个主要设置来设置值,即:

2 main settings to set values, which is:


  1. zipfile_rootdir 是zip文件所在的位置。

  2. extract_rootdir 是提取位置。

  1. zipfile_rootdir is where the zip file are located.
  2. extract_rootdir is where to extract to.

将字符串前的 r 视为原始字符串,即
,因此不需要反斜杠转义。

The r before the string is treat as raw string, so backslash escaping is not needed.

正则表达式被编译并用于从用于
提取路径的zip文件名中提取
文本。

A regular expression is compiled and used to extract the text from the zip file names used for the extraction path.

来自zip文件:


n36_e024_1arc_v3_bil.zip


使用正则表达式提取部分序列:

extracts a part sequence with use of a regular expression:


n, 36, e, 24


每个项目都是大写的,用于创建名为
的字典文件夹包含键和值:

Each item is uppercased and used to create a dictionary named folders containing keys and values:


'outer': 'N36_E'
'inner': 'NE36_24'


extract_path 将通过将
extract_rootdir folder ['outer'] 和 folder ['inner']

最后,通过使用使用上下文管理器,将解压缩zip文件。

Finally, using a Context Manager by use of with, the zip files will be extracted.

re_pattern = re.compile(r'\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)')


循环之前对正则表达式模式进行编译是为了避免在循环中对
模式进行多次编译。
在字符串前使用 r 是为了通知Python
该字符串应解释为原始
,即没有反斜杠转义。
原始字符串对于正则表达式很有用,因为
反斜杠转义用于模式。

The compile of the regular expression pattern before the loop is to avoid multiple compiles of the pattern in the loop. The use of r before the string is to inform Python that that the string should be interpreted as raw i.e. no backslash escaping. Raw strings are useful for regular expressions as backslash escaping is used for the patterns.

正则表达式模式:


\A([a-zA-Z])(\d+)_([a-zA-Z])0{0,2}(\d+)


用于正则表达式的字符串:

The string for the regular expression to work on:


n36_e024_1arc_v3_bil.zip





  1. \A 仅在字符串的开头匹配。
    这是一个锚点,不匹配任何字符。

  2. ([a-zA-Z])匹配任何字符字母字符。
    [] 与其中的任何字符匹配。
    a z
    范围内的任何字符A Z 是匹配的。 n 将被匹配。
    封闭的()存储该组被捕获到返回序列的
    中。所以序列现在是 n,

  3. (\d +)匹配1位或更多位数字。 \d
    的任意数字,而 + 则告诉它保持更多匹配。
    序列变为 n,36,

  4. _ 为文字,并且由于()没有将其括起来,因此它与
    匹配,尽管未添加到序列中。

  5. ([[a-zA-Z])与第2点相同。
    序列变为 n,36,e,

  6. 0 {0,2} 匹配零 0 ,零至2次 {0,2}
    (),因此不添加到序列中。

  7. (\d +) 与第3点相同。
    序列变为 n,36,e,24


  8. 模式到达结尾时,字符串的其余部分将被忽略。这就是为什么 \A 使用
    的原因,因此模式无法从任何地方开始,并且
    继续到不需要的字符串的末尾。 / li>
  1. \A Matches only at the start of the string. This is an anchor and does not match any character.
  2. ([a-zA-Z]) Matches any alphabet character. [] is match any characters within. Any character between the range of a to z and A to Z is matched. n will be matched. The enclosing () is store that group captured into the returned sequence. So the sequence is now n,.
  3. (\d+) Matches 1 digit or more. The \d is any digit and + tells it to keep matching more. Sequence becomes n, 36,.
  4. _ is literal and since () is not enclosing it, it is matched though is not added to the sequence.
  5. ([a-zA-Z]) Same as point 2. Sequence becomes n, 36, e,.
  6. 0{0,2} Match a zero 0, zero to 2 times {0,2}. No (), so not added to the sequence.
  7. (\d+) Same as point 3. Sequence becomes n, 36, e, 24.
  8. The rest of the string is ignored as the pattern has reached it's end. This is why the \A is used so the pattern cannot start anywhere and proceed to the end of the string that is not wanted.



格式:



序列为 N,36 ,E,24 在列表理解后被大写
之后。

Formatting:

Sequence is N, 36, E, 24 after being uppercased by the list comprehension.


  1. 模式 {0} {1} _ {2} 被排序为 0、1、2
    ,因此0是 N ,其中1是 36 ,2是 E 成为
    N36_E _ 是模式中的文字。

  2. 模式 {0} {2} {1} _ {3} 被订购
    0,2,1,3 。 0是 N ,2是 E ,1是 36
    和3是 24 成为 NE36_24

  1. The pattern {0}{1}_{2} is ordered 0, 1, 2, so 0 is N, 1 is 36 and 2 is E to become N36_E. The _ is literal in the pattern.
  2. The pattern {0}{2}{1}_{3} is ordered 0, 2, 1, 3. 0 is N, 2 is E, 1 is 36 and 3 is 24 to become NE36_24.



参考文献:




  • Python 2:

    • re module for the regular expressions.
    • format method for the formatting of strings.
    • list comprehensions used to uppercase items in the sequence.
    • zipfile module for working with zip archives.

    Python 3:


    • re 用于正则表达式的模块。

    • 格式化用于格式化字符串的方法。

    • 列表理解用于序列中的大写项目。 li>
    • zipfile 模块,用于处理zip存档。

    • re module for the regular expressions.
    • format method for the formatting of strings.
    • list comprehensions used to uppercase items in the sequence.
    • zipfile module for working with zip archives.

    这篇关于如何使用python创建具有名称的多个文件夹,并将多个zip提取到每个不同的文件夹?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆