Python中iter函数的第二个参数是什么? [英] What is the 2nd argument for the iter function in Python?

查看:95
本文介绍了Python中iter函数的第二个参数是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们考虑一个文件:

$ echo -e """This is a foo bar sentence .\nAnd this is the first txtfile in the corpus .""" > test.txt
$ cat test.txt 
This is a foo bar sentence .
And this is the first txtfile in the corpus .

当我想按字符读取文件时,可以执行 https://stackoverflow.com/a/25071590/610569 :

And when I want to read the file by character, I can do https://stackoverflow.com/a/25071590/610569:

>>> fin = open('test.txt')
>>> while fin.read(1):
...     fin.seek(-1,1)
...     print fin.read(1),
... 
T h i s   i s   a   f o o   b a r   s e n t e n c e   . 
A n d   t h i s   i s   t h e   f i r s t   t x t f i l e   i n   t h e   c o r p u s   .

但是使用while循环可能看起来有点非python esp.当我使用fin.read(1)检查EOF,然后按顺序回溯以读取当前字节时.所以我可以做这样的事情如何一次从Python文件中读取单个字符?:

But using while loop might look a little unpythonic esp. when i use fin.read(1) to check for EOF and then backtrack in-order to read the current byte. And so I can do something like this How to read a single character at a time from a file in Python?:

>>> import functools
>>> fin = open('test.txt')
>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
...     print c,
... 
T h i s   i s   a   f o o   b a r   s e n t e n c e   . 
A n d   t h i s   i s   t h e   f i r s t   t x t f i l e   i n   t h e   c o r p u s   .

但是当我尝试不带第二个参数时,它会抛出一个TypeError:

But when I tried it without the second argument, it throws a TypeError:

>>> fin = open('test.txt')
>>> fin_1byte = functools.partial(fin.read, 1)
>>> for c in iter(fin_1byte):
...     print c,
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'functools.partial' object is not iterable

iter中的第二个参数是什么?文档也不多说:

What is the 2nd argument in iter? The docs don't say much either: https://docs.python.org/2/library/functions.html#iter and https://docs.python.org/3.6/library/functions.html#iter

根据文档:

返回一个迭代器对象.根据第二个参数的存在,第一个参数的解释非常不同.如果没有第二个参数,则对象必须是支持迭代协议( iter ()方法)的集合对象,或者它必须支持序列协议( getitem ()整数参数从0开始的方法).如果它不支持这些协议中的任何一个,则会引发TypeError.如果给定第二个参数sendinel,则object必须是可调用对象.在这种情况下创建的迭代器将在每次调用其 next ()方法时不带参数的对象;如果返回的值等于哨兵,则将引发StopIteration,否则将返回该值.

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method), or it must support the sequence protocol (the getitem() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its next() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

我想文档需要一些解密":

I guess the docs require some "decrypting":

  • 没有第二个参数,对象必须是支持迭代协议( iter ()方法)的集合对象
  • Without a second argument, object must be a collection object which supports the iteration protocol (the iter() method)

这是否意味着它需要来自collections?还是只要对象具有__iter__(),就可以了吗?

Does that mean it needs to come from collections? Or is that as long as the object has an __iter__(), that's okay?

  • ,否则它必须支持序列协议( getitem ()方法,其整数参数从0开始)
  • , or it must support the sequence protocol (the getitem() method with integer arguments starting at 0)

那是相当神秘的.那么这是否意味着它试图查看该序列是否已索引并因此可查询,并且索引必须从0开始?是否还意味着索引需要是连续的,即0、1、2、3,...,而不是0、2、8、13,...?

That's rather cryptic. So does that means it tries to see whether the sequence is indexed and hence query-able and that the index must starts from 0? Does it also mean that the indices need to be sequential, i.e. 0, 1, 2, 3, ... and not something like 0, 2, 8, 13, ...?

  • 如果它不支持这些协议中的任何一个,则会引发TypeError.

是的,这部分我确实了解=)

Yes, this part, I do understand =)

  • 如果提供了第二个参数sendinel,则object必须是可调用的对象.

好的,现在这有点科幻了.仅仅是将Python称为sentinel的术语吗? sentinel在Python中是什么意思?和可调用对象"一样,它是一个函数而不是类型对象?

Okay, now this gets a little sci-fi. Is it just a terminology in Python to call something a sentinel? What does sentinel mean Pythonically? And "callable object" like it's a function and not type object?

  • 在这种情况下创建的迭代器将对其对象的next()方法的每次调用都调用不带参数的对象;

这部分我不太了解,也许举个例子会有所帮助.

This part i don't really get it, maybe an example would help.

  • 如果返回的值等于哨兵,则将引发StopIteration,否则将返回该值.

好的,所以sentinel是指某些破坏标准吗?

Okay, so sentinel here refers to some breaking criteria?

有人可以帮助解释/阐明有关iter的上述几点的意思吗?

Can someone help to decipher/clarify the meaning of the above points about iter?

推荐答案

使用一个参数,必须为iter提供一个具有__iter__特殊方法 __getitem__特殊方法的对象.方法.如果它们都不存在,则iter 引发错误

With one argument, iter must be given an object that has the __iter__ special method, or __getitem__ special method. If neither of them exists, iter will raise an error

>>> iter(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable

有2种迭代协议.旧协议依赖于调用__getitem__来获取从0到引发IndexError的连续整数.新协议依赖于从__iter__返回的迭代器.

There are 2 protocols for iteration. The old protocol relies on calling __getitem__ for successive integers from 0 until one that raises IndexError. The new protocol relies on the iterator that is returned from __iter__.

在Python 2中,str甚至没有__iter__特殊方法:

In Python 2, str doesn't even have the __iter__ special method:

Python 2.7.12+ (default, Sep 17 2016, 12:08:02) 
[GCC 6.2.0 20160914] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'abc'.__iter__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__iter__'

但是它仍然是可迭代的:

yet it is still iterable:

>>> iter('abc')
<iterator object at 0x7fcee9e89390>

要使您的自定义类可迭代,您需要使 __iter____getitem__对不存在的项目提高IndexError:

To make your custom class iterable, you need to have either __iter__ or __getitem__ that raises IndexError for non-existent items:

class Foo:
    def __iter__(self):
        return iter(range(5))

class Bar:
    def __getitem__(self, i):
        if i >= 5:
            raise IndexError
        return i

使用这些:

>>> list(iter(Foo()))
[0, 1, 2, 3, 4]
>>> list(iter(Bar()))
[0, 1, 2, 3, 4]

通常不需要显式的iter,因为for循环和期望 iterables 的方法将隐式创建迭代器:

Usually explicit iter is not needed as for loops and methods that expect iterables will create an iterator implicitly:

>>> list(Foo())
[0, 1, 2, 3, 4]
>>> for i in Bar():
0
1
2
3
4


使用2参数形式时,第一个参数必须是实现__call__的函数或对象.第一个参数称为不带参数的参数;返回值是从迭代器产生的.当在该迭代中从函数调用返回的值等于给定的 sentinel 值时,该迭代停止,就好像:


With the 2 argument form, the first argument must be a function or an object that implements __call__. The first argument is called without arguments; the return values are yielded from the iterator. The iteration stops when the value returned from the function call on that iteration equals the given sentinel value, as if by:

value = func()
if value == sentinel:
    return
else:
    yield value

例如,要在之前获得模具上的值,我们抛出6,

For example, to get values on a die before we throw 6,

>>> import random
>>> throw = lambda: random.randint(1, 6)
>>> list(iter(throw, 6))
[3, 2, 4, 5, 5]
>>> list(iter(throw, 6))
[1, 3, 1, 3, 5, 1, 4]

为进一步说明,每次在迭代器上使用next()时,都会在不带参数的情况下调用给定函数(或带有__call__特殊方法的给定对象):

To clarify it further, the given function (or the given object with __call__ special method) is called without arguments for each time the next() is used on the iterator:

>>> def throw_die():
...     die = random.randint(1, 6)
...     print("returning {}".format(die))
...     return die
...
>>> throws = iter(throw_die, 6)
>>> next(throws)
returning 2
2
>>> next(throws)
returning 4
4
>>> next(throws)
returning 6
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

(即throw被称为throw(),如果返回的值不等于6,则产生该值).

(i.e. throw is called as throw() and if the returned value didn't equal to 6, it is yielded).

或者对于

>>> fin_1byte = iter(functools.partial(fin.read, 1), '')
>>> for c in fin_1byte:
...     print c,

从文件末尾的文件中读取将返回空字符串(如果为二进制文件,则为空字节):

reading from a file at the end-of-file returns the empty string (or empty bytes if it was a binary file):

>>> from io import StringIO
>>> fin = StringIO(u'ab')
>>> fin.read(1)
u'a'
>>> fin.read(1)
u'b'
>>> fin.read(1)
u''

如果尚未在文件末尾,则将返回一个字符.

If not yet at the end of file, one character would be returned.

这也可以用于通过重复的函数调用来实现无穷的迭代器:

This can be used to also make an endless iterator from repeated function calls:

>>> dice = iter(throw, 7)

由于返回的值永远不能等于7,因此迭代器将永远运行.一个常见的习惯用法是使用匿名object来确保比较不会因任何可能的值而失败

Since the value returned can never be equal to 7, the iterator runs forever. A common idiom is to use an anonymous object to make sure that the comparison wouldn't succeed for any conceivable value

>>> dice = iter(throw, object())

因为

>>> object() != object()
True


请注意,单词 sentinel 通常用于表示在某些数据中用作结束标记的值,并且在数据中不会自然出现,例如


Note, that the word sentinel is commonly used for a value that is used as an end marker in some data, and that doesn't occur naturally within the data, as in this Java answer.

这篇关于Python中iter函数的第二个参数是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆