Python 类型:连接序列 [英] Python typing: Concatenate sequences

查看:40
本文介绍了Python 类型:连接序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 python 中,两个序列的连接通常由 + 运算符完成.但是,mypy 抱怨以下内容:

In python, concatenation of two sequences is typically done by the + operator. However, mypy complains about the following:

from typing import Sequence

def concat1(a: Sequence, b: Sequence) -> Sequence:
    return a + b

没错:Sequence 没有 __add__.但是,该功能对于通常的"用户来说工作得非常好.序列类型 liststrtuple.显然,还有其他序列类型不起作用(例如 numpy.ndarray).解决方案可能是:

And it's right: Sequence has no __add__. However, the function works perfectly fine for the "usual" sequence types list, str, tuple. Obviously, there are other sequence types where it doesn't work (e.g. numpy.ndarray). A solution could be to following:

from itertools import chain

def concat2(a: Sequence, b: Sequence) -> Sequence:
    return list(chain(a, b))

现在,mypy 不会抱怨了.但是连接字符串或元组总是给出一个列表.似乎有一个简单的解决方法:

Now, mypy doesn't complain. But concatenating strings or tuples always gives a list. There seems to be an easy fix:

def concat3(a: Sequence, b: Sequence) -> Sequence:
    T = type(a)
    return T(chain(a, b))

但是现在 mypy 不高兴了,因为 T get 的构造函数的参数太多了.更糟糕的是,该函数不再返回序列,而是返回一个生成器.

But now mypy is unhappy because the constructor for T get's too many arguments. Even worse, the function doesn't return a Sequence anymore, but it returns a generator.

这样做的正确方法是什么?我觉得问题的一部分是 a 和 b 应该具有相同的类型,并且输出也将是相同的类型,但是类型注释没有传达它.

What is the proper way of doing this? I feel that part of the issue is that a and b should have the same type and that the output will be the same type too, but the type annotations don't convey it.

注意:我知道使用 ''.join(a, b) 可以更有效地连接字符串.不过,我选择这个例子更多是为了说明目的.

Note: I am aware that concatenating strings is more efficiently done using ''.join(a, b). However, I picked this example more for illustration purposes.

推荐答案

没有通用的方法可以解决这个问题:Sequence 包含不能连接在泛型中的类型办法.例如,无法连接任意 range 对象来创建新的 range 并保留所有元素.

There is no general way to solve this: Sequence includes types which cannot be concatenated in a generic way. For example, there is no way to concatenate arbitrary range objects to create a new range and keep all elements.

必须决定一种具体的连接方式,并将可接受的类型限制为提供所需操作的类型.

One must decide on a concrete means of concatenation, and restrict the accepted types to those providing the required operations.

最简单的方法是让函数只请求所需的操作.如果typing 是不够的,可以回退为请求的操作定义一个自定义的 typing.Protocol.

The simplest approach is for the function to only request the operations needed. In case the pre-built protocols in typing are not sufficient, one can fall back to define a custom typing.Protocol for the requested operations.

由于 concat1/concat_add 需要 + 实现,所以 Protocol__add__> 需要.此外,由于加法通常适用于相似的类型,__add__ 必须在具体类型上进行参数化——否则,协议要求所有可添加的类型都可以添加到 all 其他可添加的类型.

Since concat1/concat_add requires the + implementation, a Protocol with __add__ is needed. Also, since addition usually works on similar types, __add__ must be parameterized over the concrete type – otherwise, the Protocol asks for all addable types that can be added to all other addable types.

# TypeVar to parameterize for specific types
SA = TypeVar('SA', bound='SupportsAdd')


class SupportsAdd(Protocol):
    """Any type T where +(:T, :T) -> T"""
    def __add__(self: SA, other: SA) -> SA: ...


def concat_add(a: SA, b: SA) -> SA:
    return a + b

这足以对基本序列进行类型安全的连接,并拒绝混合类型的连接.

This is sufficient to type-safely concatenate the basic sequences, and reject mixed-type concatenation.

reveal_type(concat_add([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_add("abc", "xyz"))        # note: Revealed type is 'builtins.str*'
reveal_type(concat_add([1, 2, 3], "xyz"))    # error: ...

请注意,这允许连接实现 __add__任何类型,例如 int.如果需要进一步的限制,请更仔细地定义协议 - 例如通过要求 __len____getitem__.

Be aware that this allows concatenating any type that implements __add__, for example int. If further restrictions are desired, define the Protocol more closely – for example by requiring __len__ and __getitem__.

通过链接键入连接有点复杂,但遵循相同的方法:Protocol 定义函数所需的功能,但为了类型安全,元素应键入为嗯.

Typing concatenation via chaining is a bit more complex, but follows the same approach: A Protocol defines the capabilities needed by the function, but in order to be type-safe the elements should be typed as well.

# TypeVar to parameterize for specific types and element types
C = TypeVar('C', bound='Chainable')
T = TypeVar('T', covariant=True)


# Parameterized by the element type T
class Chainable(Protocol[T]):
    """Any type C[T] where C[T](:Iterable[T]) -> C[T] and iter(:C[T]) -> Iterable[T]"""
    def __init__(self, items: Iterable[T]): ...

    def __iter__(self) -> Iterator[T]: ...


def concat_chain(a: C, b: C) -> C:
    T = type(a)
    return T(chain(a, b))

这足以对由自身构造的序列进行类型安全的连接,并拒绝混合类型的连接和非序列.

This is sufficient to type-safely concatenate sequences constructed from themselves, and reject mixed-type concatenation and non-sequences.

reveal_type(concat_chain([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_chain("abc", "xyz"))        # note: Revealed type is 'builtins.str*'
reveal_type(concat_chain([1, 2, 3], "xyz"))    # error: ...
reveal_type(concat_chain(1, 2))                # error: ...

这篇关于Python 类型:连接序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆