Python 类型:连接序列 [英] Python typing: Concatenate sequences
问题描述
在 python 中,两个序列的连接通常由 +
运算符完成.但是,mypy 抱怨以下内容:
In python, concatenation of two sequences is typically done by the +
operator. However, mypy complains about the following:
from typing import Sequence
def concat1(a: Sequence, b: Sequence) -> Sequence:
return a + b
没错:Sequence
没有 __add__
.但是,该功能对于通常的"用户来说工作得非常好.序列类型 list
、str
、tuple
.显然,还有其他序列类型不起作用(例如 numpy.ndarray
).解决方案可能是:
And it's right: Sequence
has no __add__
. However, the function works perfectly fine for the "usual" sequence types list
, str
, tuple
. Obviously, there are other sequence types where it doesn't work (e.g. numpy.ndarray
). A solution could be to following:
from itertools import chain
def concat2(a: Sequence, b: Sequence) -> Sequence:
return list(chain(a, b))
现在,mypy 不会抱怨了.但是连接字符串或元组总是给出一个列表.似乎有一个简单的解决方法:
Now, mypy doesn't complain. But concatenating strings or tuples always gives a list. There seems to be an easy fix:
def concat3(a: Sequence, b: Sequence) -> Sequence:
T = type(a)
return T(chain(a, b))
但是现在 mypy 不高兴了,因为 T get 的构造函数的参数太多了.更糟糕的是,该函数不再返回序列,而是返回一个生成器.
But now mypy is unhappy because the constructor for T get's too many arguments. Even worse, the function doesn't return a Sequence anymore, but it returns a generator.
这样做的正确方法是什么?我觉得问题的一部分是 a 和 b 应该具有相同的类型,并且输出也将是相同的类型,但是类型注释没有传达它.
What is the proper way of doing this? I feel that part of the issue is that a and b should have the same type and that the output will be the same type too, but the type annotations don't convey it.
注意:我知道使用 ''.join(a, b)
可以更有效地连接字符串.不过,我选择这个例子更多是为了说明目的.
Note: I am aware that concatenating strings is more efficiently done using ''.join(a, b)
. However, I picked this example more for illustration purposes.
推荐答案
没有通用的方法可以解决这个问题:Sequence
包含不能连接在泛型中的类型办法.例如,无法连接任意 range
对象来创建新的 range
并保留所有元素.
There is no general way to solve this: Sequence
includes types which cannot be concatenated in a generic way. For example, there is no way to concatenate arbitrary range
objects to create a new range
and keep all elements.
必须决定一种具体的连接方式,并将可接受的类型限制为提供所需操作的类型.
One must decide on a concrete means of concatenation, and restrict the accepted types to those providing the required operations.
最简单的方法是让函数只请求所需的操作.如果在typing中预建协议code>
是不够的,可以回退为请求的操作定义一个自定义的 typing.Protocol
.
The simplest approach is for the function to only request the operations needed. In case the pre-built protocols in typing
are not sufficient, one can fall back to define a custom typing.Protocol
for the requested operations.
由于 concat1
/concat_add
需要 +
实现,所以 Protocol
和 __add__
> 需要.此外,由于加法通常适用于相似的类型,__add__
必须在具体类型上进行参数化——否则,协议要求所有可添加的类型都可以添加到 all 其他可添加的类型.
Since concat1
/concat_add
requires the +
implementation, a Protocol
with __add__
is needed. Also, since addition usually works on similar types, __add__
must be parameterized over the concrete type – otherwise, the Protocol asks for all addable types that can be added to all other addable types.
# TypeVar to parameterize for specific types
SA = TypeVar('SA', bound='SupportsAdd')
class SupportsAdd(Protocol):
"""Any type T where +(:T, :T) -> T"""
def __add__(self: SA, other: SA) -> SA: ...
def concat_add(a: SA, b: SA) -> SA:
return a + b
这足以对基本序列进行类型安全的连接,并拒绝混合类型的连接.
This is sufficient to type-safely concatenate the basic sequences, and reject mixed-type concatenation.
reveal_type(concat_add([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_add("abc", "xyz")) # note: Revealed type is 'builtins.str*'
reveal_type(concat_add([1, 2, 3], "xyz")) # error: ...
请注意,这允许连接实现 __add__
的任何类型,例如 int
.如果需要进一步的限制,请更仔细地定义协议 - 例如通过要求 __len__
和 __getitem__
.
Be aware that this allows concatenating any type that implements __add__
, for example int
. If further restrictions are desired, define the Protocol more closely – for example by requiring __len__
and __getitem__
.
通过链接键入连接有点复杂,但遵循相同的方法:Protocol
定义函数所需的功能,但为了类型安全,元素应键入为嗯.
Typing concatenation via chaining is a bit more complex, but follows the same approach: A Protocol
defines the capabilities needed by the function, but in order to be type-safe the elements should be typed as well.
# TypeVar to parameterize for specific types and element types
C = TypeVar('C', bound='Chainable')
T = TypeVar('T', covariant=True)
# Parameterized by the element type T
class Chainable(Protocol[T]):
"""Any type C[T] where C[T](:Iterable[T]) -> C[T] and iter(:C[T]) -> Iterable[T]"""
def __init__(self, items: Iterable[T]): ...
def __iter__(self) -> Iterator[T]: ...
def concat_chain(a: C, b: C) -> C:
T = type(a)
return T(chain(a, b))
这足以对由自身构造的序列进行类型安全的连接,并拒绝混合类型的连接和非序列.
This is sufficient to type-safely concatenate sequences constructed from themselves, and reject mixed-type concatenation and non-sequences.
reveal_type(concat_chain([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_chain("abc", "xyz")) # note: Revealed type is 'builtins.str*'
reveal_type(concat_chain([1, 2, 3], "xyz")) # error: ...
reveal_type(concat_chain(1, 2)) # error: ...
这篇关于Python 类型:连接序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!