覆盖以前提取的文件,而不是创建新文件 [英] Overwriting previously extracted files instead of creating new ones

查看:422
本文介绍了覆盖以前提取的文件,而不是创建新文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有几个库用于通过Python来提取归档文件,比如gzip,zipfile库,rarfile,tarfile,patool等。我发现其中一个库(patool)由于其交叉格式特性而特别有用在这个意义上,它可以提取几乎任何类型的档案,包括最流行的如ZIP,GZIP,TAR和RAR。

就像这样简单:

  patoolib.extract_archive(Archive.zip,outdir =Folder1)

其中Archive.zip是档案文件和Folder1是解压缩文件将被存储的路径。



提取工作正常。问题是,如果我再次运行相同的代码完全相同的档案文件,一个相同的解压缩文件将被存储在同一个文件夹中,但名称略有不同(第一次运行的文件名,第二次的文件名1,第三个等等。

而不是这个,如果目录中已经存在一个同名的文件,我需要代码覆盖解压缩的文件。


$ b

这个 extract_archive 函数看起来非常小 - 它只有这两个参数,详细程度参数,还有一个程序参数,它指定了你想要解压缩的程序。

编辑:
Nizam Mohamed的答案记录了 extract_archive 函数实际上覆盖了输出,我发现这是部分正确的 - 函数覆盖ZIP文件,但不是GZ文件,这是我以后。对于GZ文件,该功能仍然会生成新的文件。



编辑
Padraic Cunningham的答案建议使用。所以,我下载了这个代码,并用链接中的脚本替换了旧的patool库脚本。结果如下:

  os.listdir()
Out [11]:['a.gz']

patoolib.extract_archive(a.gz,verbosity = 1,outdir =。)
patool:提取a.gz ...
patool:... a.gz解压到`。'。
Out [12]:'。'

patoolib.extract_archive(a.gz,verbosity = 1,outdir =。)
patool:提取a.gz ...
patool:... a.gz解压到`。'。
Out [13]:'。'

patoolib.extract_archive(a.gz,verbosity = 1,outdir =。)
patool:提取a.gz ...
patool:... a.gz解压到`。'。
Out [14]:'。'

os.listdir()
Out [15]:['a','a.gz','a1',' a2']

同样, extract_archive 函数每次执行时都会创建新文件。存档在 a.gz 下的文件实际上与 a 名称不同。


<正如你所说,patoolib的目的是成为一个通用的归档工具。


使用patool可以创建,提取,测试,列出,比较,搜索和重新打包各种归档类型。 patool的好处是它可以简单的处理归档文件,而不必记住大量的程序和选项。

通用提取行为与具体抽取行为



这里的问题是 extract_archive 不能修改存档工具的基本默认行为。



对于.zip扩展名,patoolib将使用unzip。您可以通过将-o作为选项传递到命令行界面来获得所需的解压缩存档行为。即 unzip -o ... 但是,这是一个用于解压缩的特定命令行选项,对于每个存档实用程序都是如此。例如,tar提供了一个覆盖选项,但没有与zip一样缩短的命令行。即 tar --overwrite 但是 tar -o 没有预期的效果。



要解决这个问题,您可以向作者发送一个功能请求,或者使用一个替代库。不幸的是,patoolib的口头禅需要扩展所有的抽取工具函数,然后实现底层提取器自己的覆盖命令选项。


$ b

patoolib示例更改



patoolib.programs.unzip

  def extract_zip(archive,compression,cmd,verbosity,outdir,overwrite = False):
提取ZIP压缩文件。
cmdlist = [cmd]
if verbance> 1:
cmdlist.append(' - v')
如果覆盖:
cmdlist.append(' - o')
cmdlist.extend([' - ',archive ,'-d',outdir])
return cmdlist

patoolib.programs.tar

  def extract_tar(archive,compression,cmd,verbosity,outdir, 
cmdlist = [cmd,'--extract']
如果覆盖:
cmdlist.append($ overwrite = False):
提取一个TAR档案。 '--overwrite')
add_tar_opts(cmdlist,compression,verbosity)
cmdlist.extend([ - file,archive,'--directory',outdir])
return cmdlist

更新每个程序并不是微不足道,每个程序都不一样!
$ b

Monkey修补覆盖行为

所以你决定不改善patoolib的源代码...我们可以覆盖 extract_archive 的行为最初寻找一个现有的目录,删除它,然后调用原始的 extract_archive



你的模块,如果有许多模块需要它,可能会粘贴 __ init __。py

  import os 
从shutil导入patoolib
导入rmtree

$ b $ def overwrite_then_extract_archive(存档,详细= 0,outdir =无,程序=无):
如果outdir:
如果os.path.exists(outdir):
shutil.rmtree(outdir)
patoolib.extract_archive(存档,详细程度,outdir,程序)

patoolib.extract_archive = overwrite_then_extract_archive

现在当我们调用 extract_archive() 我们的功能是 overwrite_then_extract_archive()


There are a few libraries used to extract archive files through Python, such as gzip, zipfile library, rarfile, tarfile, patool etc. I found one of the libraries (patool) to be especially useful due to its cross-format feature in the sense that it can extract almost any type of archive including the most popular ones such as ZIP, GZIP, TAR and RAR.

To extract an archive file with patool it is as easy as this:

patoolib.extract_archive( "Archive.zip",outdir="Folder1")

Where the "Archive.zip" is the path of the archive file and the "Folder1" is the path of the directory where the extracted file will be stored.

The extracting works fine. The problem is that if I run the same code again for the exact same archive file, an identical extracted file will be stored in the same folder but with a slightly different name (filename at the first run, filename1 at the second, filename11 at the third and so on.

Instead of this, I need the code to overwrite the extracted file if a file under a same name already exists in the directory.

This extract_archive function looks so minimal - it only have these two parameters, a verbosity parameter, and a program parameter which specifies the program you want to extract archives with.

Edits: Nizam Mohamed's answer documented that extract_archive function is actually overwriting the output. I found out that was partially true - the function overwrites ZIP files, but not GZ files which is what I am after. For GZ files, the function still generates new files.

Edits Padraic Cunningham's answer suggested using the master source . So, I downloaded that code and replaced my old patool library scripts with the scripts in the link. Here is the result:

os.listdir()
Out[11]: ['a.gz']

patoolib.extract_archive("a.gz",verbosity=1,outdir=".")
patool: Extracting a.gz ...
patool: ... a.gz extracted to `.'.
Out[12]: '.'

patoolib.extract_archive("a.gz",verbosity=1,outdir=".")
patool: Extracting a.gz ...
patool: ... a.gz extracted to `.'.
Out[13]: '.'

patoolib.extract_archive("a.gz",verbosity=1,outdir=".")
patool: Extracting a.gz ...
patool: ... a.gz extracted to `.'.
Out[14]: '.'

os.listdir()
Out[15]: ['a', 'a.gz', 'a1', 'a2']

So, again, the extract_archive function is creating new files everytime it is executed. The file archived under a.gz has a different name from a actually.

解决方案

As you've stated, patoolib is intended to be a generic archive tool.

Various archive types can be created, extracted, tested, listed, compared, searched and repacked with patool. The advantage of patool is its simplicity in handling archive files without having to remember a myriad of programs and options.

Generic Extract Behaviour vs Specific Extract Behaviour

The problem here is that extract_archive does not expose the ability to modify the underlying default behaviour of the archive tool extensively.

For a .zip extension, patoolib will use unzip. You can have the desired behaviour of extracting the archive by passing -o as an option to the command line interface. i.e. unzip -o ... However, this is a specific command line option for unzip, and this changes for each archive utility.

For example tar offers an overwrite option, but no shortened command line equivalent as zip. i.e. tar --overwrite but tar -o does not have the intended effect.

To fix this issue you could make a feature request to the author, or use an alternative library. Unfortunately, the mantra of patoolib would require extending all extract utility functions to then implement the underlying extractors own overwrite command options.

Example Changes to patoolib

In patoolib.programs.unzip

def extract_zip (archive, compression, cmd, verbosity, outdir, overwrite=False):
    """Extract a ZIP archive."""
    cmdlist = [cmd]
    if verbosity > 1:
        cmdlist.append('-v')
    if overwrite:
        cmdlist.append('-o')
    cmdlist.extend(['--', archive, '-d', outdir])
    return cmdlist

In patoolib.programs.tar

def extract_tar (archive, compression, cmd, verbosity, outdir, overwrite=False):
    """Extract a TAR archive."""
    cmdlist = [cmd, '--extract']
    if overwrite:
        cmdlist.append('--overwrite')
    add_tar_opts(cmdlist, compression, verbosity)
    cmdlist.extend(["--file", archive, '--directory', outdir])
    return cmdlist

It's not a trivial change to update every program, each program is different!

Monkey patching overwrite behavior

So you've decided to not improve the patoolib source code... We can overwrite the behaviour of extract_archive to initially look for an existing directory, remove it, then call the original extract_archive.

You could include this code in your modules, if many modules require it, perhaps stick it __init__.py

import os
import patoolib
from shutil import rmtree


def overwrite_then_extract_archive(archive, verbosity=0, outdir=None, program=None):
    if outdir:
        if os.path.exists(outdir):
            shutil.rmtree(outdir)
    patoolib.extract_archive(archive, verbosity, outdir, program)

patoolib.extract_archive = overwrite_then_extract_archive

Now when we call extract_archive() we have the functionality of overwrite_then_extract_archive().

这篇关于覆盖以前提取的文件,而不是创建新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆