在不修改sys.path或第三方软件包的情况下,在Python软件包中导入供应商依赖性 [英] Import vendored dependencies in Python package without modifying sys.path or 3rd party packages

查看:67
本文介绍了在不修改sys.path或第三方软件包的情况下,在Python软件包中导入供应商依赖性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要

我正在为 Anki (一个开源抽认卡程序)开发一系列附加组件. Anki附加组件以Python软件包的形式提供,其基本文件夹结构如下所示:

anki_addons/
    addon_name_1/
        __init__.py
    addon_name_2/
        __init__.py

基本应用程序将

anki_addons附加到sys.path,然后使用import <addon_name>导入每个add_on.

我一直试图解决的问题是找到一种可靠的方式来使用我的附件运送软件包及其依赖项,同时又不污染全局状态或不依赖于对供应商软件包的手动编辑

具体

具体来说,给定这样的附加结构...

addon_name_1/
    __init__.py
    _vendor/
        __init__.py
        library1
        library2
        dependency_of_library2
        ...

...我希望能够导入_vendor目录中包含的任何任意软件包,例如:

from ._vendor import library1

此类相对导入的主要困难在于,它们不适用于还依赖于通过绝对引用导入的其他软件包的软件包(例如,源代码library2中的import dependency_of_library2)

解决方案尝试

到目前为止,我已经探索了以下选项:

  1. 手动更新第三方程序包,以便它们的导入语句指向我的python程序包中的标准模块路径(例如import addon_name_1._vendor.dependency_of_library2).但这是一件繁琐的工作,无法扩展到较大的依赖树,也无法移植到其他程序包中.
  2. 在我的程序包初始化文件中通过sys.path.insert(1, <path_to_vendor_dir>)_vendor添加到sys.path.这可行,但是它对模块查找路径进行了全局更改,这将影响其他加载项,甚至影响基本应用程序本身.似乎是一种黑客行为,可能会在以后导致pandora出现一系列问题(例如,同一软件包的不同版本之间发生冲突等).
  3. 为我的导入临时修改sys.path ;但这对于使用方法级导入的第三方模块不起作用.
  4. 根据我发现的示例编写 PEP302 样式的自定义导入器在 setuptools 中,但我只是没办法也不是它的尾巴.


我已经在这个问题上停留了好几个小时了,我开始认为我要么完全错过了一种简单的方法来执行此操作,要么我的整个方法存在根本上的错误. /p>

我是否可以通过代码附带第三方软件包的依赖树,而不必借助sys.path黑客或修改有问题的软件包?


仅需澄清一下:我无法控制如何从anki_addons文件夹中导入加载项. anki_addons只是基本应用程序提供的目录,所有附加组件均安装在该目录中.它被添加到sys路径中,因此其中的附加程序包的行为几乎与位于Python模块查找路径中的任何其他python程序包一样.

解决方案

首先,我建议不要出售;一些主要软件包以前曾使用过供应商,但为了避免不得不处理供应商的痛苦,已经放弃了.这样的示例之一就是 requests.如果您依靠使用pip install的人员来安装软件包,那么只需使用依赖项并向人们介绍虚拟环境.不必假设您需要承担使依赖关系混乱的负担,也不必阻止人们在全局Python site-packages位置中安装依赖项.

同时,我知道第三方工具的插件环境有所不同,并且如果对该工具使用的Python安装添加依赖项很麻烦或不可能进行商贩销售,则是一个可行的选择.我看到Anki在没有setuptools支持的情况下将扩展名分发为.zip文件,因此肯定是这样的环境.

因此,如果您选择供应商依赖性,请使用脚本来管理依赖性并更新其导入.这是您的选择#1,但自动.

这是pip项目选择的路径,请参见其 tasks子目录以实现自动化,该子目录建立在 invoke上.请参见pip项目供应自述文件政策和基本原理(其中一个原因是pip需要自行进行 bootstrap 操作,例如,使其依赖项可用于安装任何东西).

您不应使用任何其他选项;您已经列举了#2和#3的问题.

使用自定义导入程序的选项#4的问题在于,您仍然需要重写导入.换句话说,setuptools使用的自定义导入程序钩子根本无法解决供应商名称空间的问题,相反,如果缺少供应商软件包,则可以动态导入顶级软件包(此问题是 <setuptools供应商子包中的c22>项目setuptools.extern名称空间由自定义导入挂钩处理,如果从供应商化的软件包导入失败,则它将重定向到setuptools._vendor或顶级名称.

pip自动更新供应商的软件包的步骤如下:

  • 删除_vendor/子目录中的所有内容,但文档,__init__.py文件和需求文本文件除外.
  • 使用pip使用名为vendor.txt的专用需求文件将所有供应商的依赖项安装到该目录中,避免编译.pyc字节缓存文件并忽略瞬时依赖项(假定这些已经在vendor.txt中列出了) );使用的命令是pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps.
  • 删除由pip安装但在供应商环境中不需要的所有内容,例如*.dist-info*.egg-infobin目录以及已安装的依赖项中的一些内容,这些内容将永远不会使用.
  • 收集所有已安装目录,并添加扩展名为.py的文件(没有白名单中的任何内容);这是vendored_libs列表.
  • 重写导入;这只是一系列正则表达式,其中vendored_lists中的每个名称都用import pip._vendor.<name>替换import <name>出现的内容,而用from pip._vendor.<name>(.*) import替换每个from <name>(.*) import出现的内容.
  • 应用一些补丁以清除所需的其余更改;从供应商的角度来看,只有pip requests 的补丁在这里很有趣,因为它为requests库已删除的供应商软件包更新了requests库的向后兼容性层;这个补丁是相当元的!

因此,从本质上讲,这是pip方法最重要的部分,供应商程序包导入的重写非常简单;为了简化逻辑并删除pip特定部分而解释,它只是以下过程:

import shutil
import subprocess
import re

from functools import partial
from itertools import chain
from pathlib import Path

WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}

def delete_all(*paths, whitelist=frozenset()):
    for item in paths:
        if item.is_dir():
            shutil.rmtree(item, ignore_errors=True)
        elif item.is_file() and item.name not in whitelist:
            item.unlink()

def iter_subtree(path):
    """Recursively yield all files in a subtree, depth-first"""
    if not path.is_dir():
        if path.is_file():
            yield path
        return
    for item in path.iterdir():
        if item.is_dir():
            yield from iter_subtree(item)
        elif item.is_file():
            yield item

def patch_vendor_imports(file, replacements):
    text = file.read_text('utf8')
    for replacement in replacements:
        text = replacement(text)
    file.write_text(text, 'utf8')

def find_vendored_libs(vendor_dir, whitelist):
    vendored_libs = []
    paths = []
    for item in vendor_dir.iterdir():
        if item.is_dir():
            vendored_libs.append(item.name)
        elif item.is_file() and item.name not in whitelist:
            vendored_libs.append(item.stem)  # without extension
        else:  # not a dir or a file not in the whilelist
            continue
        paths.append(item)
    return vendored_libs, paths

def vendor(vendor_dir):
    # target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor
    pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'

    # remove everything
    delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)

    # install with pip
    subprocess.run([
        'pip', 'install', '-t', str(vendor_dir),
        '-r', str(vendor_dir / 'vendor.txt'),
        '--no-compile', '--no-deps'
    ])

    # delete stuff that's not needed
    delete_all(
        *vendor_dir.glob('*.dist-info'),
        *vendor_dir.glob('*.egg-info'),
        vendor_dir / 'bin')

    vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)

    replacements = []
    for lib in vendored_libs:
        replacements += (
            partial(  # import bar -> import foo._vendor.bar
                re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub,
                r'\1from {} import {}\n'.format(pkgname, lib)
            ),
            partial(  # from bar -> from foo._vendor.bar
                re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub,
                r'\1from {}.{}\2'.format(pkgname, lib)
            ),
        )

    for file in chain.from_iterable(map(iter_subtree, paths)):
        patch_vendor_imports(file, replacements)

if __name__ == '__main__':
    # this assumes this is a script in foo next to foo/_vendor
    here = Path('__file__').resolve().parent
    vendor_dir = here / 'foo' / '_vendor'
    assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'
    assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'
    vendor(vendor_dir)

Summary

I am working on a series of add-ons for Anki, an open-source flashcard program. Anki add-ons are shipped as Python packages, with the basic folder structure looking as follows:

anki_addons/
    addon_name_1/
        __init__.py
    addon_name_2/
        __init__.py

anki_addons is appended to sys.path by the base app, which then imports each add_on with import <addon_name>.

The problem I have been trying to solve is to find a reliable way to ship packages and their dependencies with my add-ons while not polluting global state or falling back to manual edits of the vendored packages.

Specifics

Specifically, given an add-on structure like this...

addon_name_1/
    __init__.py
    _vendor/
        __init__.py
        library1
        library2
        dependency_of_library2
        ...

...I would like to be able to import any arbitrary package that is included in the _vendor directory, e.g.:

from ._vendor import library1

The main difficulty with relative imports like this is that they do not work for packages that also depend on other packages imported through absolute references (e.g. import dependency_of_library2 in the source code of library2)

Solution attempts

So far I have explored the following options:

  1. Manually updating the third-party packages, so that their import statements point to the fully qualified module path within my python package (e.g. import addon_name_1._vendor.dependency_of_library2). But this is tedious work that is not scalable to larger dependency trees and not portable to other packages.
  2. Adding _vendor to sys.path via sys.path.insert(1, <path_to_vendor_dir>) in my package init file. This works, but it introduces a global change to the module look-up path which will affect other add-ons and even the base app itself. It just seems like a hack that could result in a pandora's box of issues later down the line (e.g. conflicts between different versions of the same package, etc.).
  3. Temporarily modifying sys.path for my imports; but this fails to work for third-party modules with method-level imports.
  4. Writing a PEP302-style custom importer based off an example I found in setuptools, but I just couldn't make head nor tail of that.


I've been stuck on this for quite a few hours now and I'm beginning to think that I'm either completely missing an easy way to do this, or that there is something fundamentally wrong with my entire approach.

Is there no way I can ship a dependency tree of third-party packages with my code, without having to resort to sys.path hacks or modifying the packages in question?


Edit:

Just to clarify: I don't have any control over how add-ons are imported from the anki_addons folder. anki_addons is just the directory provided by the base app where all add-ons are installed into. It is added to the sys path, so the add-on packages therein pretty much just behave like any other python package located in Python's module look-up paths.

解决方案

First of all, I'd advice against vendoring; a few major packages did use vendoring before but have switched away to avoid the pain of having to handle vendoring. One such example is the requests library. If you are relying on people using pip install to install your package, then just use dependencies and tell people about virtual environments. Don't assume you need to shoulder the burden of keeping dependencies untangled or need to stop people from installing dependencies in the global Python site-packages location.

At the same time, I appreciate that a plug-in environment of a third-party tool is something different, and if adding dependencies to the Python installation used by that tool is cumbersome or impossible vendorizing may be a viable option. I see that Anki distributes extensions as .zip files without setuptools support, so that's certainly such an environment.

So if you choose to vendor dependencies, then use a script to manage your dependencies and update their imports. This is your option #1, but automated.

This is the path that the pip project has chosen, see their tasks subdirectory for their automation, which builds on the invoke library. See the pip project vendoring README for their policy and rationale (chief among those is that pip needs to bootstrap itself, e.g. have their dependencies available to be able to install anything).

You should not use any of the other options; you already enumerated the issues with #2 and #3.

The issue with option #4, using a custom importer, is that you still need to rewrite imports. Put differently, the custom importer hook used by setuptools doesn't solve the vendorized namespace problem at all, it instead makes it possible to dynamically import top-level packages if the vendorized packages are missing (a problem that pip solves with a manual debundling process). setuptools actually uses option #1, where they rewrite the source code for vendorized packages. See for example these lines in the packaging project in the setuptools vendored subpackage; the setuptools.extern namespace is handled by the custom import hook, which then redirects either to setuptools._vendor or the top-level name if importing from the vendorized package fails.

The pip automation to update vendored packages takes the following steps:

  • Delete everything in the _vendor/ subdirectory except the documentation, the __init__.py file and the requirements text file.
  • Use pip to install all vendored dependencies into that directory, using a dedicated requirements file named vendor.txt, avoiding compilation of .pyc bytecache files and ignoring transient dependencies (these are assumed to be listed in vendor.txt already); the command used is pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps.
  • Delete everything that was installed by pip but not needed in a vendored environment, i.e. *.dist-info, *.egg-info, the bin directory, and a few things from installed dependencies that pip would never use.
  • Collect all installed directories and added files sans .py extension (so anything not in the whitelist); this is the vendored_libs list.
  • Rewrite imports; this is simply a series of regexes, where every name in vendored_lists is used to replace import <name> occurrences with import pip._vendor.<name> and every from <name>(.*) import occurrence with from pip._vendor.<name>(.*) import.
  • Apply a few patches to mop up the remaining changes needed; from a vendoring perspective, only the pip patch for requests is interesting here in that it updates the requests library backwards compatibility layer for the vendored packages that the requests library had removed; this patch is quite meta!

So in essence, the most important part of the pip approach, the rewriting of vendored package imports is quite simple; paraphrased to simplify the logic and removing the pip specific parts, it is simply the following process:

import shutil
import subprocess
import re

from functools import partial
from itertools import chain
from pathlib import Path

WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}

def delete_all(*paths, whitelist=frozenset()):
    for item in paths:
        if item.is_dir():
            shutil.rmtree(item, ignore_errors=True)
        elif item.is_file() and item.name not in whitelist:
            item.unlink()

def iter_subtree(path):
    """Recursively yield all files in a subtree, depth-first"""
    if not path.is_dir():
        if path.is_file():
            yield path
        return
    for item in path.iterdir():
        if item.is_dir():
            yield from iter_subtree(item)
        elif item.is_file():
            yield item

def patch_vendor_imports(file, replacements):
    text = file.read_text('utf8')
    for replacement in replacements:
        text = replacement(text)
    file.write_text(text, 'utf8')

def find_vendored_libs(vendor_dir, whitelist):
    vendored_libs = []
    paths = []
    for item in vendor_dir.iterdir():
        if item.is_dir():
            vendored_libs.append(item.name)
        elif item.is_file() and item.name not in whitelist:
            vendored_libs.append(item.stem)  # without extension
        else:  # not a dir or a file not in the whilelist
            continue
        paths.append(item)
    return vendored_libs, paths

def vendor(vendor_dir):
    # target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor
    pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'

    # remove everything
    delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)

    # install with pip
    subprocess.run([
        'pip', 'install', '-t', str(vendor_dir),
        '-r', str(vendor_dir / 'vendor.txt'),
        '--no-compile', '--no-deps'
    ])

    # delete stuff that's not needed
    delete_all(
        *vendor_dir.glob('*.dist-info'),
        *vendor_dir.glob('*.egg-info'),
        vendor_dir / 'bin')

    vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)

    replacements = []
    for lib in vendored_libs:
        replacements += (
            partial(  # import bar -> import foo._vendor.bar
                re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub,
                r'\1from {} import {}\n'.format(pkgname, lib)
            ),
            partial(  # from bar -> from foo._vendor.bar
                re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub,
                r'\1from {}.{}\2'.format(pkgname, lib)
            ),
        )

    for file in chain.from_iterable(map(iter_subtree, paths)):
        patch_vendor_imports(file, replacements)

if __name__ == '__main__':
    # this assumes this is a script in foo next to foo/_vendor
    here = Path('__file__').resolve().parent
    vendor_dir = here / 'foo' / '_vendor'
    assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'
    assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'
    vendor(vendor_dir)

这篇关于在不修改sys.path或第三方软件包的情况下,在Python软件包中导入供应商依赖性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆