以pythonic方式控制发电机 [英] Controlling a generator the pythonic way

查看:70
本文介绍了以pythonic方式控制发电机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我正在试图找出与
a发电机进行互动的最pythonic方式。


我正在努力完成的任务是编写一个PDF标记器,我希望将
作为Python生成器实现。假设可以处理

toknizing PDF的所有丑陋细节(例如任意

二进制内容的嵌入流)。但仍存在一个问题:为了获得随机文件访问权限,标记器不应该简单地从文件中依次吐出一系列

令牌;它应该可以随意地将它指向文件中的位置。


我可以看到两种可能性:当前文件位置

必须在每个产量之后从某个地方读取(例如,传递给

发生器的可变对象),或者需要实例化新的生成器
$每次令牌化器指向新的文件位置时b $ b。

第一种方法的缺点是指针值暴露在外,并且由于将PDF拷贝到令牌的复杂规则,

在生成器代码中会有很多yield语句,这对于很多指针分配会产生很多的
。这对我来说似乎很难看。


第二种方法在这方面更干净,但是将

标记器指向某个地方现在增加了创建一个语义的语义整个

新生成器实例。使用tokenizer的程序员现在需要

记得每次重置

指针时都丢弃对生成器的任何引用,这也很难看。


这里有人有第三种处理方法吗?否则,

哪个丑陋的是更多的pythonic?


非常感谢任何想法。


- -

托马斯

Hi,

I''m trying to figure out what is the most pythonic way to interact with
a generator.

The task I''m trying to accomplish is writing a PDF tokenizer, and I want
to implement it as a Python generator. Suppose all the ugly details of
toknizing PDF can be handled (such as embedded streams of arbitrary
binary content). There remains one problem, though: In order to get
random file access, the tokenizer should not simply spit out a series of
tokens read from the file sequentially; it should rather be possible to
point it at places in the file at random.

I can see two possibilities to do this: either the current file position
has to be read from somewhere (say, a mutable object passed to the
generator) after each yield, or a new generator needs to be instantiated
every time the tokenizer is pointed to a new file position.

The first approach has both the disadvantage that the pointer value is
exposed and that due to the complex rules for hacking a PDF to tokens,
there will be a lot of yield statements in the generator code, which
would make for a lot of pointer assignments. This seems ugly to me.

The second approach is cleaner in that respect, but pointing the
tokenizer to some place has now the added semantics of creating a whole
new generator instance. The programmer using the tokenizer now needs to
remember to throw away any references to the generator each time the
pointer is reset, which is also ugly.

Does anybody here have a third way of dealing with this? Otherwise,
which ugliness is the more pythonic one?

Thanks a lot for any ideas.

--
Thomas

推荐答案

Thomas Lotze写道:
Thomas Lotze wrote:
我可以看到两种可能性这个:当前文件位置
必须在每个产量之后从某个地方读取(例如,传递给
生成器的可变对象),或者每次需要实例化新生成器
tokenizer指向一个新的文件位置。
...
这里有人有第三种处理方法吗?否则,哪个丑陋更py?
I can see two possibilities to do this: either the current file position
has to be read from somewhere (say, a mutable object passed to the
generator) after each yield, or a new generator needs to be instantiated
every time the tokenizer is pointed to a new file position.
...
Does anybody here have a third way of dealing with this? Otherwise,
which ugliness is the more pythonic one?




第三种方法,对于这种情况肯定是最干净的,

是一个自定义类,它存储你需要的状态信息,

并让生成器只是该类中的一个方法。生成器必须是一个独立的功能没有

的原因。

class PdfTokenizer:

def __init __(自我,...):

#设置初始状态


def getTokens(个体经营):

无论如何:< br $>
收益令牌


def seek(self,newPosition):

#在这里改变状态


#用法:

pdf = PdfTokenizer(''myfile.pdf'',...)

for pdf.getTokens()中的标记:

#做东西......


如果我需要换位置:

pdf.seek(...)


很容易就像馅饼一样! :-)


-Peter



The third approach, which is certain to be cleanest for this situation,
is to have a custom class which stores the state information you need,
and have the generator simply be a method in that class. There''s no
reason that a generator has to be a standalone function.

class PdfTokenizer:
def __init__(self, ...):
# set up initial state

def getTokens(self):
while whatever:
yield token

def seek(self, newPosition):
# change state here

# usage:
pdf = PdfTokenizer(''myfile.pdf'', ...)
for token in pdf.getTokens():
# do stuff...

if I need to change position:
pdf.seek(...)

Easy as pie! :-)

-Peter


Peter Hansen写道:
Peter Hansen wrote:
Thomas Lotze写道:
Thomas Lotze wrote:
我可以看到两种可能性:要么必须从某处读取当前文件位置
(比如,传递给
生成器的可变对象)每个产量,[...]
I can see two possibilities to do this: either the current file position
has to be read from somewhere (say, a mutable object passed to the
generator) after each yield, [...]



第三种方法,对于这种情况肯定是最干净的,是有一个自定义类来存储状态信息你需要,并且
让生成器只是该类中的一种方法。



The third approach, which is certain to be cleanest for this situation, is
to have a custom class which stores the state information you need, and
have the generator simply be a method in that class.




就生成器代码而言,这基本上与

将一个可变对象传递给(可能是独立的)生成器。对象

可能被称为self,并且值存储在它的属性中。


可能这确实是最好的方式,因为它没有不需要程序员

来记住任何副作用。


然而,它确实需要大量的属性访问,这需要花费一些时间/>
周期。


相关问题是跳过空白。有时你不在乎

空格代币,有时候你会这么做。使用生成器,你可以设置

a状态变量,比如说生成器是一个属性的对象,

在每次调用之前需要偏离默认值,或者你可以

有第二个生成器来过滤第一个的输出。同样,两个

解决方案都很难看(第二个解决方案比第一个更难)。一个使用

副作用而不是传递参数,这是一个真正的b $ b想要的,而另一个是愚蠢和缓慢的(过滤可以在没有
的情况下完成)
重新审视一下事情。)


所有这一切让我想知道是否有更复杂的生成器语义

(甚至可能允许传递参数在下一个()调用中)不会对b $ b有用。是的,我已经阅读了最近在PEP 343发布的帖子 - 叹息。


-

Thomas



Which is, as far as the generator code is concerned, basically the same as
passing a mutable object to a (possibly standalone) generator. The object
will likely be called self, and the value is stored in an attribute of it.

Probably this is indeed the best way as it doesn''t require the programmer
to remember any side-effects.

It does, however, require a lot of attribute access, which does cost some
cycles.

A related problem is skipping whitespace. Sometimes you don''t care about
whitespace tokens, sometimes you do. Using generators, you can either set
a state variable, say on the object the generator is an attribute of,
before each call that requires a deviation from the default, or you can
have a second generator for filtering the output of the first. Again, both
solutions are ugly (the second more so than the first). One uses
side-effects instead of passing parameters, which is what one really
wants, while the other is dumb and slow (filtering can be done without
taking a second look at things).

All of this makes me wonder whether more elaborate generator semantics
(maybe even allowing for passing arguments in the next() call) would not
be useful. And yes, I have read the recent postings on PEP 343 - sigh.

--
Thomas


Thomas Lotze写道:
Thomas Lotze wrote:
就生成器代码而言,这与将可变对象传递给(可能是独立的)生成器的基本相同。对象
可能会被称为self,并且值存储在它的属性中。


公平,但谁关心生成器代码的想法?它是程序员必须处理的重要事项,并且一个对象将会比一个生成器加上可变对象具有更清晰的接口。

可能这确实是最好的方法,因为它不需要程序员记住任何副作用。

但它需要一个很多属性访问,这确实需要花费一些时间。
Which is, as far as the generator code is concerned, basically the same as
passing a mutable object to a (possibly standalone) generator. The object
will likely be called self, and the value is stored in an attribute of it.
Fair enough, but who cares what the generator code thinks? It''s what
the programmer has to deal with that matters, and an object is going to
have a cleaner interface than a generator-plus-mutable-object.
Probably this is indeed the best way as it doesn''t require the programmer
to remember any side-effects.

It does, however, require a lot of attribute access, which does cost some
cycles.




Hmm ...过早优化我只能这么说。


-Peter



Hmm... "premature optimization" is all I have to say about that.

-Peter


这篇关于以pythonic方式控制发电机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆