Singleton Python生成器?或者,腌制一个python生成器? [英] Singleton python generator? Or, pickle a python generator?

查看:100
本文介绍了Singleton Python生成器?或者,腌制一个python生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有嵌套生成器的以下代码来迭代文本文档并使用get_train_minibatch()返回培训示例.我想坚持使用(生成器)生成器,以便可以回到文本文档中的同一位置.但是,您不能腌制发电机.

I am using the following code, with nested generators, to iterate over a text document and return training examples using get_train_minibatch(). I would like to persist (pickle) the generators, so I can get back to the same place in the text document. However, you cannot pickle generators.

  • 是否有一个简单的解决方法,以便我可以保存自己的位置并从停下来的位置重新开始?也许我可以使get_train_example()单身,所以我周围没有几个生成器.然后,我可以在此模块中创建一个全局变量,以跟踪get_train_example()的距离.

  • Is there a simple workaround, so that I can save my position and start back where I stopped? Perhaps I can make get_train_example() a singleton, so I don't have several generators lying around. Then, I could make a global variable in this module that keeps track of how far along get_train_example() is.

您有一个更好(更清洁)的建议,可以让我坚持使用此生成器吗?

Do you have a better (cleaner) suggestion, to allow me to persist this generator?

[edit:另外两个想法:

[edit: Two more ideas:

  • 是否可以向生成器添加成员变量/方法,以便可以调用generator.tell()并找到文件位置?因为这样,下次我创建该生成器时,可以要求它搜索到该位置. 这个想法听起来很简单.

我可以创建一个类并将文件位置作为成员变量,然后在类中创建生成器并在每次生成文件时更新文件位置成员变量吗?因为这样我才能知道它在文件中的距离.

Can I create a class and have the file location be a member variable, and then have the generator created within the class and update the file location member variable each time it yields? Because then I can know how far into the file it it.

]

这是代码:

def get_train_example():
    for l in open(HYPERPARAMETERS["TRAIN_SENTENCES"]):
        prevwords = []
        for w in string.split(l):
            w = string.strip(w)
            id = None
            prevwords.append(wordmap.id(w))
            if len(prevwords) >= HYPERPARAMETERS["WINDOW_SIZE"]:
                yield prevwords[-HYPERPARAMETERS["WINDOW_SIZE"]:]

def get_train_minibatch():
    minibatch = []
    for e in get_train_example():
        minibatch.append(e)
        if len(minibatch) >= HYPERPARAMETERS["MINIBATCH SIZE"]:
            assert len(minibatch) == HYPERPARAMETERS["MINIBATCH SIZE"]
            yield minibatch
            minibatch = []

推荐答案

以下代码应该或多或少地满足您的要求.第一类定义类似于文件但可以被腌制的东西. (解开文件时,它会重新打开文件,并查找到将其腌制时的位置).第二类是生成单词窗口的迭代器.

The following code should do more-or-less what you want. The first class defines something that acts like a file but can be pickled. (When you unpickle it, it re-opens the file, and seeks to the location where it was when you pickled it). The second class is an iterator that generates word windows.

class PickleableFile(object):
    def __init__(self, filename, mode='rb'):
        self.filename = filename
        self.mode = mode
        self.file = open(filename, mode)
    def __getstate__(self):
        state = dict(filename=self.filename, mode=self.mode,
                     closed=self.file.closed)
        if not self.file.closed:
            state['filepos'] = self.file.tell()
        return state
    def __setstate__(self, state):
        self.filename = state['filename']
        self.mode = state['mode']
        self.file = open(self.filename, self.mode)
        if state['closed']: self.file.close()
        else: self.file.seek(state['filepos'])
    def __getattr__(self, attr):
        return getattr(self.file, attr)

class WordWindowReader:
    def __init__(self, filenames, window_size):
        self.filenames = filenames
        self.window_size = window_size
        self.filenum = 0
        self.stream = None
        self.filepos = 0
        self.prevwords = []
        self.current_line = []

    def __iter__(self):
        return self

    def next(self):
        # Read through files until we have a non-empty current line.
        while not self.current_line:
            if self.stream is None:
                if self.filenum >= len(self.filenames):
                    raise StopIteration
                else:
                    self.stream = PickleableFile(self.filenames[self.filenum])
                    self.stream.seek(self.filepos)
                    self.prevwords = []
            line = self.stream.readline()
            self.filepos = self.stream.tell()
            if line == '':
                # End of file.
                self.stream = None
                self.filenum += 1
                self.filepos = 0
            else:
                # Reverse line so we can pop off words.
                self.current_line = line.split()[::-1]

        # Get the first word of the current line, and add it to
        # prevwords.  Truncate prevwords when necessary.
        word = self.current_line.pop()
        self.prevwords.append(word)
        if len(self.prevwords) > self.window_size:
            self.prevwords = self.prevwords[-self.window_size:]

        # If we have enough words, then return a word window;
        # otherwise, go on to the next word.
        if len(self.prevwords) == self.window_size:
            return self.prevwords
        else:
            return self.next()

这篇关于Singleton Python生成器?或者,腌制一个python生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆