使用Python安全提取zip或tar [英] Safely extract zip or tar using Python

查看:140
本文介绍了使用Python安全提取zip或tar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将用户提交的zip和tar文件提取到目录中. zipfile的 extractall 方法的文档(类似于tarfile的extractall )指出,路径可能是绝对路径,也可能包含目标路径之外的..路径.相反,我可以这样自己使用extract:

I'm trying to extract user-submitted zip and tar files to a directory. The documentation for zipfile's extractall method (similarly with tarfile's extractall) states that it's possible for paths to be absolute or contain .. paths that go outside the destination path. Instead, I could use extract myself, like this:

some_path = '/destination/path'
some_zip = '/some/file.zip'
zipf = zipfile.ZipFile(some_zip, mode='r')
for subfile in zipf.namelist():
    zipf.extract(subfile, some_path)

这样安全吗?在这种情况下,归档中的文件是否有可能出现在some_path之外?如果是这样,我怎样才能确保文件永远不会在目标目录之外结束?

Is this safe? Is it possible for a file in the archive to wind up outside of some_path in this case? If so, what way can I ensure that files will never wind up outside the destination directory?

推荐答案

注意:从python 2.7.4开始,这不是ZIP存档的问题.详细信息位于答案的底部.这个答案集中在tar档案上.

Note: Starting with python 2.7.4, this is a non-issue for ZIP archives. Details at the bottom of the answer. This answer focuses on tar archives.

要弄清楚路径真正指向的位置,请使用os.path.abspath()(但请注意有关符号链接作为路径组件的警告).如果您使用abspath标准化了zip文件中的路径,并且包含当前目录作为前缀,则它指向外部.

To figure out where a path really points to, use os.path.abspath() (but note the caveat about symlinks as path components). If you normalize a path from your zipfile with abspath and it does not contain the current directory as a prefix, it's pointing outside it.

但是您还需要检查从存档中提取的任何符号链接的(tarfile和unix zipfile都可以存储符号链接).如果您担心会故意绕过您的安全性的众所周知的恶意用户",而不是简单地将自身安装在系统库中的应用程序,则这一点很重要.

But you also need to check the value of any symlink extracted from your archive (both tarfiles and unix zipfiles can store symlinks). This is important if you are worried about a proverbial "malicious user" that would intentionally bypass your security, rather than an application that simply installs itself in system libraries.

这就是前面提到的警告:如果您的沙箱已经包含指向目录的符号链接,则会误导abspath.即使指向沙箱中的符号链接也可能很危险:符号链接sandbox/subdir/foo -> ..指向sandbox,因此,路径sandbox/subdir/foo/../.bashrc也应被禁止.最简单的方法是等到先前的文件被提取并使用os.path.realpath().幸运的是extractall()接受了一个生成器,所以很容易做到.

That's the aforementioned caveat: abspath will be misled if your sandbox already contains a symlink that points to a directory. Even a symlink that points within the sandbox can be dangerous: The symlink sandbox/subdir/foo -> .. points to sandbox, so the path sandbox/subdir/foo/../.bashrc should be disallowed. The easiest way to do so is to wait until the previous files have been extracted and use os.path.realpath(). Fortunately extractall() accepts a generator, so this is easy to do.

由于您要求输入代码,因此以下内容对算法进行了说明.它不仅禁止将文件提取到沙箱外部的位置(这是要求的),而且禁止创建指向沙箱外部位置的链接在沙箱内部.我很想知道是否有人可以偷渡任何流浪文件或链接.

Since you ask for code, here's a bit that explicates the algorithm. It prohibits not only the extraction of files to locations outside the sandbox (which is what was requested), but also the creation of links inside the sandbox that point to locations outside the sandbox. I'm curious to hear if anyone can sneak any stray files or links past it.

import tarfile
from os.path import abspath, realpath, dirname, join as joinpath
from sys import stderr

resolved = lambda x: realpath(abspath(x))

def badpath(path, base):
    # joinpath will ignore base if path is absolute
    return not resolved(joinpath(base,path)).startswith(base)

def badlink(info, base):
    # Links are interpreted relative to the directory containing the link
    tip = resolved(joinpath(base, dirname(info.name)))
    return badpath(info.linkname, base=tip)

def safemembers(members):
    base = resolved(".")

    for finfo in members:
        if badpath(finfo.name, base):
            print >>stderr, finfo.name, "is blocked (illegal path)"
        elif finfo.issym() and badlink(finfo,base):
            print >>stderr, finfo.name, "is blocked: Hard link to", finfo.linkname
        elif finfo.islnk() and badlink(finfo,base):
            print >>stderr, finfo.name, "is blocked: Symlink to", finfo.linkname
        else:
            yield finfo

ar = tarfile.open("testtar.tar")
ar.extractall(path="./sandbox", members=safemembers(ar))
ar.close()

从python 2.7.4开始,这不是ZIP存档的问题:方法

Starting with python 2.7.4, this is a non-issue for ZIP archives: The method zipfile.extract() prohibits the creation of files outside the sandbox:

注意::如果成员文件名是绝对路径,则将删除驱动器/UNC共享点和前导(反)斜杠,例如:///foo/bar在Unix上变为foo/bar,而<在Windows上,c14>变为foo\bar.并且成员文件名中的所有".."组件都将被删除,例如:../../foo../../ba..r变为foo../ba..r.在Windows上,非法字符(:<>|"?*)[]用下划线(_)代替.

Note: If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On Windows, illegal characters (:, <, >, |, ", ?, and *) [are] replaced by underscore (_).

tarfile类没有经过同样的清理,因此上面的答案仍然适用.

The tarfile class has not been similarly sanitized, so the above answer still apllies.

这篇关于使用Python安全提取zip或tar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆