在python中声明相等长度的zip迭代器 [英] zip iterators asserting for equal length in python

查看:60
本文介绍了在python中声明相等长度的zip迭代器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来zip多个可迭代对象,如果可迭代对象的长度不相等,则会引发异常.

在可迭代对象为列表或具有len方法的情况下,此解决方案简单易行:

def zip_equal(it1, it2):
    if len(it1) != len(it2):
        raise ValueError("Lengths of iterables are different")
    return zip(it1, it2)

但是,如果it1it2是生成器,则前一个函数将失败,因为未定义长度TypeError: object of type 'generator' has no len().

我想像 itertools 模块提供了一种简单的实现方式那个,但是到目前为止我还没有找到它.我想出了这个自制的解决方案:

def zip_equal(it1, it2):
    exhausted = False
    while True:
        try:
            el1 = next(it1)
            if exhausted: # in a previous iteration it2 was exhausted but it1 still has elements
                raise ValueError("it1 and it2 have different lengths")
        except StopIteration:
            exhausted = True
            # it2 must be exhausted too.
        try:
            el2 = next(it2)
            # here it2 is not exhausted.
            if exhausted:  # it1 was exhausted => raise
                raise ValueError("it1 and it2 have different lengths")
        except StopIteration:
            # here it2 is exhausted
            if not exhausted:
                # but it1 was not exhausted => raise
                raise ValueError("it1 and it2 have different lengths")
            exhausted = True
        if not exhausted:
            yield (el1, el2)
        else:
            return

可以使用以下代码测试该解决方案:

it1 = (x for x in ['a', 'b', 'c'])  # it1 has length 3
it2 = (x for x in [0, 1, 2, 3])     # it2 has length 4
list(zip_equal(it1, it2))           # len(it1) < len(it2) => raise
it1 = (x for x in ['a', 'b', 'c'])  # it1 has length 3
it2 = (x for x in [0, 1, 2, 3])     # it2 has length 4
list(zip_equal(it2, it1))           # len(it2) > len(it1) => raise
it1 = (x for x in ['a', 'b', 'c', 'd'])  # it1 has length 4
it2 = (x for x in [0, 1, 2, 3])          # it2 has length 4
list(zip_equal(it1, it2))                # like zip (or izip in python2)

我可以忽略任何替代解决方案吗?我的zip_equal函数是否有更简单的实现?

更新:

  • 没有外部依赖关系的简单答案: Martijn Pieters的 answer
  • 比Martin复杂,但性能更好:cjerdonek的答案
  • 需要python 3.10或更高版本,请参见Asocia的 answer
  • 如果您不介意软件包依赖性,请参阅 pylang的答案
我可以想到一个更简单的解决方案,如果生成的元组中存在用于填充较短的可迭代对象的哨兵值,请使用itertools.zip_longest()并引发异常:

from itertools import zip_longest

def zip_equal(*iterables):
    sentinel = object()
    for combo in zip_longest(*iterables, fillvalue=sentinel):
        if sentinel in combo:
            raise ValueError('Iterables have different lengths')
        yield combo

不幸的是,我们不能将zip()yield from一起使用,以避免每次迭代都进行测试的Python代码循环.一旦最短的迭代器用完,zip()就会推进所有在前的迭代器,从而吞噬其中是否只有一个额外项的证据.

I am looking for a nice way to zip several iterables raising an exception if the lengths of the iterables are not equal.

In the case where the iterables are lists or have a len method this solution is clean and easy:

def zip_equal(it1, it2):
    if len(it1) != len(it2):
        raise ValueError("Lengths of iterables are different")
    return zip(it1, it2)

However, if it1 and it2 are generators, the previous function fails because the length is not defined TypeError: object of type 'generator' has no len().

I imagine the itertools module offers a simple way to implement that, but so far I have not been able to find it. I have come up with this home-made solution:

def zip_equal(it1, it2):
    exhausted = False
    while True:
        try:
            el1 = next(it1)
            if exhausted: # in a previous iteration it2 was exhausted but it1 still has elements
                raise ValueError("it1 and it2 have different lengths")
        except StopIteration:
            exhausted = True
            # it2 must be exhausted too.
        try:
            el2 = next(it2)
            # here it2 is not exhausted.
            if exhausted:  # it1 was exhausted => raise
                raise ValueError("it1 and it2 have different lengths")
        except StopIteration:
            # here it2 is exhausted
            if not exhausted:
                # but it1 was not exhausted => raise
                raise ValueError("it1 and it2 have different lengths")
            exhausted = True
        if not exhausted:
            yield (el1, el2)
        else:
            return

The solution can be tested with the following code:

it1 = (x for x in ['a', 'b', 'c'])  # it1 has length 3
it2 = (x for x in [0, 1, 2, 3])     # it2 has length 4
list(zip_equal(it1, it2))           # len(it1) < len(it2) => raise
it1 = (x for x in ['a', 'b', 'c'])  # it1 has length 3
it2 = (x for x in [0, 1, 2, 3])     # it2 has length 4
list(zip_equal(it2, it1))           # len(it2) > len(it1) => raise
it1 = (x for x in ['a', 'b', 'c', 'd'])  # it1 has length 4
it2 = (x for x in [0, 1, 2, 3])          # it2 has length 4
list(zip_equal(it1, it2))                # like zip (or izip in python2)

Am I overlooking any alternative solution? Is there a simpler implementation of my zip_equal function?

Update:

  • Simple answer without external dependencies: Martijn Pieters' answer
  • More complex than Martin's, but with better performance: cjerdonek's answer
  • Requiring python 3.10 or newer, see Asocia's answer
  • If you don't mind a package dependency, see pylang's answer

解决方案

I can think of a simpler solution, use itertools.zip_longest() and raise an exception if the sentinel value used to pad out shorter iterables is present in the tuple produced:

from itertools import zip_longest

def zip_equal(*iterables):
    sentinel = object()
    for combo in zip_longest(*iterables, fillvalue=sentinel):
        if sentinel in combo:
            raise ValueError('Iterables have different lengths')
        yield combo

Unfortunately, we can't use zip() with yield from to avoid a Python-code loop with a test each iteration; once the shortest iterator runs out, zip() would advance all preceding iterators and thus swallow the evidence if there is but one extra item in those.

这篇关于在python中声明相等长度的zip迭代器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆