Python:使用并尊重原始文件名和文件扩展名提取gz文件 [英] Python: Extract gz files with and honor original filenames and file extensions

查看:111
本文介绍了Python:使用并尊重原始文件名和文件扩展名提取gz文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个文件夹下,我有很多.gz文件,在这些gz文件中,有些是.txt,有些是.csv,有些是.xml,或者其他一些扩展名.

Under a folder, I have many .gz files and within these gz files some are .txt, some are .csv, some are .xml, or some other extensions.

例如文件夹中的gz(in()中的原始/压缩文件)文件将是

E.g. gz (the original/compressed file in()) files in the folder will be

C:\Xiang\filename1.txt.gz (filename1.txt)
C:\Xiang\filename2.txt.gz (filename2.txt)
C:\Xiang\some_prefix_filename3.txt.gz (filename3.txt)
...
C:\Xiang\xmlfile1.xml_some_postfix.gz   (xmlfile1.xml)
C:\Xiang\yyyymmddxmlfile2.xml.gz       (xmlfile2.xml)
...
C:\Xiang\someotherName.csv.gz            (someotherName.csv)
C:\Xiang\possiblePrefixsomeotherfile1.someotherExtension.gz (someotherfile1.someotherExtension)
C:\Xiang\someotherfile2.someotherExtensionPossiblePostfix.gz (someotherfile2.someotherExtension)
...

我如何在Windows 10上的Python中将所有.gz文件简单地上拉到文件夹 C:\ Xiang 下并保存到文件夹 C:\ UnZipGz 中,遵循原始文件名,结果如下:

How could I simply up-zip all the .gz files in Python on Windows 10 under the folder C:\Xiang and save into folder C:\UnZipGz, honor the original filenames, with the result as follows:

C:\UnZipGz\filename1.txt
C:\UnZipGz\filename2.txt
C:\UnZipGz\filename3.txt
...
C:\UnZipGz\xmlfile1.xml.
C:\UnZipGz\xmlfile2.xml.
...
C:\UnZipGz\someotherName.csv.
C:\UnZipGz\someotherfile1.someotherExtension
C:\UnZipGz\someotherfile2.someotherExtension
...

通常,gz文件的命名约定与内部文件的文件名一致,但并非总是如此.不知何故,重命名为某些.gz文件是在过去发生的.现在,gz文件名不一定与gz文件中的文件名匹配.

Generally, the gz files naming convention are consistent with the filenames of the files inside, but it is not always the case. Somehow, renaming to the some .gz files happened in the past. Now the gz file names does not necessarily match with the filenames of the file in gz files.

如何提取所有gz文件并保留原始文件的文件名和扩展名.即,无论gz文件如何命名,提取gz文件时,仅将原始格式的未压缩文件另存为

How could I extract all the gz files and keep the original file filenames and extensions. I.e, regardless how the gz files are named, when extracting gz files, only save the un-zip files in the original format as

filename.fileExtension

进入 C:\ UnZipGz 文件夹.

推荐答案

import gzip
import os


INPUT_DIRECTORY = 'C:\Xiang'
OUTPUT_DIRECTORY = 'C:\UnZipGz'
GZIP_EXTENSION = '.gz'


def make_output_path(output_directory, zipped_name):
    """ Generate a path to write the unzipped file to.

    :param str output_directory: Directory to place the file in
    :param str zipped_name: Name of the zipped file
    :return str:
    """
    name_without_gzip_extension = zipped_name[:-len(GZIP_EXTENSION)]
    return os.path.join(output_directory, name_without_gzip_extension)


for file in os.scandir(INPUT_DIRECTORY):
    if not file.name.lower().endswith(GZIP_EXTENSION):
        continue

    output_path = make_output_path(OUTPUT_DIRECTORY, file.name)

    print('Decompressing', file.path, 'to', output_path)

    with gzip.open(file.path, 'rb') as file:
        with open(output_path, 'wb') as output_file:
            output_file.write(file.read())

说明:

  1. 遍历文件夹中具有相关扩展名的所有文件.
  2. 生成不带gzip扩展名的新目录的路径.
  3. 打开文件并将其解压缩后的内容写入新路径.


要获取原始文件名,可以使用 gzinfo : https://github.com/PierreSelim/gzinfo

>>> import gzinfo
>>> info = gzinfo.read_gz_info('bar.txt.gz')
>>> info.fname
'foo.txt'


提取原始文件名的参考:

这篇关于Python:使用并尊重原始文件名和文件扩展名提取gz文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆