Python 中 iter 函数的第二个参数是什么? [英] What is the 2nd argument for the iter function in Python?

查看:33
本文介绍了Python 中 iter 函数的第二个参数是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们考虑一个文件:

$ echo -e """这是一个foo bar的句子.\n这是语料库中的第一个txtfile.""" >测试.txt$猫测试.txt这是一个 foo bar 语句.这是语料库中的第一个 txtfile .

当我想按字符读取文件时,我可以这样做 https://stackoverflow.com/a/25071590/610569:

<预><代码>>>>fin = open('test.txt')>>>而 fin.read(1):... fin.seek(-1,1)... 打印 fin.read(1),...这很有趣.一个完整的文件

但是使用 while 循环可能看起来有点 unpythonic esp.当我使用 fin.read(1) 检查 EOF 然后回溯以读取当前字节时.所以我可以做这样的事情如何从 Python 中的文件一次读取一个字符?::><预><代码>>>>导入功能工具>>>fin = open('test.txt')>>>fin_1byte = iter(functools.partial(fin.read, 1), '')>>>对于 fin_1byte 中的 c:... 打印 c,...这很有趣.一个完整的文件

但是当我在没有第二个参数的情况下尝试它时,它会抛出一个 TypeError:

<预><代码>>>>fin = open('test.txt')>>>fin_1byte = functools.partial(fin.read, 1)>>>对于迭代中的 c(fin_1byte):... 打印 c,...回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中类型错误:functools.partial"对象不可迭代

iter 中的第二个参数是什么? 文档也没有说太多:https://docs.python.org/2/library/functions.html#iterhttps://docs.python.org/3.6/library/functions.html#iter

<小时>

根据文档:

<块引用>

返回一个迭代器对象.根据第二个参数的存在,第一个参数的解释非常不同.如果没有第二个参数,对象必须是支持迭代协议(iter() 方法)的集合对象,或者它必须支持序列协议(getitem()具有从 0 开始的整数参数的方法).如果它不支持这些协议中的任何一个,则会引发 TypeError.如果给出了第二个参数 sentinel,则 object 必须是可调用对象.在这种情况下创建的迭代器将在每次调用 next() 方法时调用不带参数的对象;如果返回值等于 sentinel,则引发 StopIteration,否则返回值.

我猜文档需要一些解密":

  • 如果没有第二个参数,对象必须是支持迭代协议的集合对象(iter() 方法)

这是否意味着它需要来自 collections?还是只要对象有一个__iter__()就可以了?

  • ,或者它必须支持序列协议(getitem() 方法,整数参数从 0 开始)

这很神秘.那么这是否意味着它会尝试查看序列是否已编入索引并因此可查询,并且索引必须从 0 开始?这是否也意味着索引需要是连续的,即 0, 1, 2, 3, ... 而不是像 0, 2, 8, 13, ... 之类的东西?

  • 如果它不支持这些协议中的任何一个,就会引发 TypeError.

是的,这部分,我明白=)

  • 如果给出了第二个参数 sentinel,则 object 必须是可调用对象.

好的,现在有点科幻了.将某些东西称为 sentinel 是否只是 Python 中的一个术语?sentinel 在 Python 中是什么意思?而可调用对象"就像一个函数而不是类型对象?

  • 在这种情况下创建的迭代器将在每次调用其 next() 方法时调用不带参数的对象;

这部分我不太明白,也许一个例子会有所帮助.

  • 如果返回的值等于 sentinel,则 StopIteration 将被引发,否则将返回该值.

好的,所以 sentinel 在这里指的是一些破坏标准?

有人可以帮忙破译/澄清以上关于iter的要点的含义吗?

解决方案

对于一个参数,iter 必须被赋予一个具有 __iter__ 特殊方法的对象, __getitem__ 特殊方法.如果它们都不存在,iter 引发错误

<预><代码>>>>迭代器(无)回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中TypeError: 'NoneType' 对象不可迭代

有 2 个迭代协议.旧协议依赖于对从 0 到引发 IndexError 的连续整数调用 __getitem__.新协议依赖于从 __iter__ 返回的迭代器.

在 Python 2 中,str 甚至没有 __iter__ 特殊方法:

Python 2.7.12+(默认,2016 年 9 月 17 日,12:08:02)[GCC 6.2.0 20160914] 在 linux2输入帮助"、版权"、信用"或许可证"以获取更多信息.>>>'abc'.__iter__回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中AttributeError: 'str' 对象没有属性 '__iter__'

但它仍然是可迭代的:

<预><代码>>>>iter('abc')<0x7fcee9e89390处的迭代器对象>

要使您的自定义类可迭代,您需要要么 __iter____getitem__ 引发 IndexError不存在的项目:

 类 Foo:def __iter__(self):返回迭代器(范围(5))班级酒吧:def __getitem__(self, i):如果 i >= 5:引发索引错误返回我

使用这些:

<预><代码>>>>列表(迭代器(Foo()))[0, 1, 2, 3, 4]>>>列表(迭代(酒吧()))[0, 1, 2, 3, 4]

通常不需要显式的 iter,因为 for 循环和期望 iterables 会隐式创建迭代器的方法:

<预><代码>>>>列表(Foo())[0, 1, 2, 3, 4]>>>对于 Bar() 中的 i:01234

<小时>

使用 2 参数形式,第一个参数必须是实现 __call__ 的函数或对象.第一个参数不带参数调用;返回值是从迭代器产生的.当该迭代的函数调用返回的值等于给定的 sentinel 值时,迭代停止,就像通过:

value = func()如果值 == 哨兵:返回别的:屈服值

例如,要在 之前获得骰子上的值,我们抛出 6,

<预><代码>>>>随机导入>>>throw = lambda: random.randint(1, 6)>>>列表(迭代器(抛出,6))[3, 2, 4, 5, 5]>>>列表(迭代器(抛出,6))[1, 3, 1, 3, 5, 1, 4]

为了进一步澄清,每次在 next() 上使用 next()迭代器:

<预><代码>>>>def throw_die():... die = random.randint(1, 6)...打印(返回{}".格式(死))...返回死...>>>throws = iter(throw_die, 6)>>>下一个(投掷)返回 22>>>下一个(投掷)返回 44>>>下一个(投掷)返回 6回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中停止迭代

(即throw被称为throw(),如果返回的值不等于6,则产生).

或者在

的情况下<预><代码>>>>fin_1byte = iter(functools.partial(fin.read, 1), '')>>>对于 fin_1byte 中的 c:... 打印 c,

从文件末尾的文件读取返回空字符串(如果是二进制文件,则返回空字节):

<预><代码>>>>从 io 导入 StringIO>>>fin = StringIO(u'ab')>>>fin.read(1)你啊>>>fin.read(1)你'b'>>>fin.read(1)你''

如果还没有到文件末尾,则返回一个字符.

这也可用于从重复的函数调用中生成无限迭代器:

<预><代码>>>>骰子 = iter(throw, 7)

由于返回的值永远不可能等于 7,因此迭代器将永远运行.一个常见的习惯用法是使用匿名 object 来确保比较不会对任何可能的值成功

<预><代码>>>>骰子 = iter(throw, object())

因为

<预><代码>>>>对象()!= 对象()真的

<小时>

请注意,哨兵一词通常用于表示在某些数据中用作结束标记的值,并且不会在数据中自然出现,例如 这个Java答案.

Let's consider a file:

$ echo -e """This is a foo bar sentence .\nAnd this is the first txtfile in the corpus .""" > test.txt
$ cat test.txt 
This is a foo bar sentence .
And this is the first txtfile in the corpus .

And when I want to read the file by character, I can do https://stackoverflow.com/a/25071590/610569:

>>> fin = open('test.txt')
>>> while fin.read(1):
...     fin.seek(-1,1)
...     print fin.read(1),
... 
T h i s   i s   a   f o o   b a r   s e n t e n c e   . 
A n d   t h i s   i s   t h e   f i r s t   t x t f i l e   i n   t h e   c o r p u s   .

But using while loop might look a little unpythonic esp. when i use fin.read(1) to check for EOF and then backtrack in-order to read the current byte. And so I can do something like this How to read a single character at a time from a file in Python?:

>>> import functools
>>> fin = open('test.txt')
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
...     print c,
... 
T h i s   i s   a   f o o   b a r   s e n t e n c e   . 
A n d   t h i s   i s   t h e   f i r s t   t x t f i l e   i n   t h e   c o r p u s   .

But when I tried it without the second argument, it throws a TypeError:

>>> fin = open('test.txt')
>>> fin_1byte = functools.partial(fin.read, 1)
>>> for c in iter(fin_1byte):
...     print c,
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'functools.partial' object is not iterable

What is the 2nd argument in iter? The docs don't say much either: https://docs.python.org/2/library/functions.html#iter and https://docs.python.org/3.6/library/functions.html#iter


As per the doc:

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method), or it must support the sequence protocol (the getitem() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

I guess the docs require some "decrypting":

  • Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method)

Does that mean it needs to come from collections? Or is that as long as the object has an __iter__(), that's okay?

  • , or it must support the sequence protocol (the getitem() method with integer arguments starting at 0)

That's rather cryptic. So does that means it tries to see whether the sequence is indexed and hence query-able and that the index must starts from 0? Does it also mean that the indices need to be sequential, i.e. 0, 1, 2, 3, ... and not something like 0, 2, 8, 13, ...?

  • If it does not support either of those protocols, TypeError is raised.

Yes, this part, I do understand =)

  • If the second argument, sentinel, is given, then object must be a callable object.

Okay, now this gets a little sci-fi. Is it just a terminology in Python to call something a sentinel? What does sentinel mean Pythonically? And "callable object" like it's a function and not type object?

  • The iterator created in this case will call object with no arguments for each call to its next() method;

This part i don't really get it, maybe an example would help.

  • if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

Okay, so sentinel here refers to some breaking criteria?

Can someone help to decipher/clarify the meaning of the above points about iter?

解决方案

With one argument, iter must be given an object that has the __iter__ special method, or __getitem__ special method. If neither of them exists, iter will raise an error

>>> iter(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

There are 2 protocols for iteration. The old protocol relies on calling __getitem__ for successive integers from 0 until one that raises IndexError. The new protocol relies on the iterator that is returned from __iter__.

In Python 2, str doesn't even have the __iter__ special method:

Python 2.7.12+ (default, Sep 17 2016, 12:08:02) 
[GCC 6.2.0 20160914] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'abc'.__iter__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__iter__'

yet it is still iterable:

>>> iter('abc')
<iterator object at 0x7fcee9e89390>

To make your custom class iterable, you need to have either __iter__ or __getitem__ that raises IndexError for non-existent items:

class Foo:
    def __iter__(self):
        return iter(range(5))

class Bar:
    def __getitem__(self, i):
        if i >= 5:
            raise IndexError
        return i

Using these:

>>> list(iter(Foo()))
[0, 1, 2, 3, 4]
>>> list(iter(Bar()))
[0, 1, 2, 3, 4]

Usually explicit iter is not needed as for loops and methods that expect iterables will create an iterator implicitly:

>>> list(Foo())
[0, 1, 2, 3, 4]
>>> for i in Bar():
0
1
2
3
4


With the 2 argument form, the first argument must be a function or an object that implements __call__. The first argument is called without arguments; the return values are yielded from the iterator. The iteration stops when the value returned from the function call on that iteration equals the given sentinel value, as if by:

value = func()
if value == sentinel:
    return
else:
    yield value

For example, to get values on a die before we throw 6,

>>> import random
>>> throw = lambda: random.randint(1, 6)
>>> list(iter(throw, 6))
[3, 2, 4, 5, 5]
>>> list(iter(throw, 6))
[1, 3, 1, 3, 5, 1, 4]

To clarify it further, the given function (or the given object with __call__ special method) is called without arguments for each time the next() is used on the iterator:

>>> def throw_die():
...     die = random.randint(1, 6)
...     print("returning {}".format(die))
...     return die
...
>>> throws = iter(throw_die, 6)
>>> next(throws)
returning 2
2
>>> next(throws)
returning 4
4
>>> next(throws)
returning 6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

(i.e. throw is called as throw() and if the returned value didn't equal to 6, it is yielded).

Or in the case of

>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
...     print c,

reading from a file at the end-of-file returns the empty string (or empty bytes if it was a binary file):

>>> from io import StringIO
>>> fin = StringIO(u'ab')
>>> fin.read(1)
u'a'
>>> fin.read(1)
u'b'
>>> fin.read(1)
u''

If not yet at the end of file, one character would be returned.

This can be used to also make an endless iterator from repeated function calls:

>>> dice = iter(throw, 7)

Since the value returned can never be equal to 7, the iterator runs forever. A common idiom is to use an anonymous object to make sure that the comparison wouldn't succeed for any conceivable value

>>> dice = iter(throw, object())

Because

>>> object() != object()
True


Note, that the word sentinel is commonly used for a value that is used as an end marker in some data, and that doesn't occur naturally within the data, as in this Java answer.

这篇关于Python 中 iter 函数的第二个参数是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆