如何延迟加载数据结构(Python) [英] How to lazy load a data structure (python)

查看:74
本文介绍了如何延迟加载数据结构(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有某种方法来建立数据结构(例如,从某些文件内容中得出):

I have some way of building a data structure (out of some file contents, say):

def loadfile(FILE):
    return # some data structure created from the contents of FILE

所以我可以做类似的事情

So I can do things like

puppies = loadfile("puppies.csv") # wait for loadfile to work
kitties = loadfile("kitties.csv") # wait some more
print len(puppies)
print puppies[32]

在上面的示例中,我浪费了很多时间来实际读取kitties.csv并创建一个我从未使用过的数据结构.我想避免这种浪费,而不必在每次想做某事时都不断检查if not kitties.我希望能够做到

In the above example, I wasted a bunch of time actually reading kitties.csv and creating a data structure that I never used. I'd like to avoid that waste without constantly checking if not kitties whenever I want to do something. I'd like to be able to do

puppies = lazyload("puppies.csv") # instant
kitties = lazyload("kitties.csv") # instant
print len(puppies)                # wait for loadfile
print puppies[32]

因此,如果我从不尝试对kitties做任何事情,则永远不会调用loadfile("kitties.csv").

So if I don't ever try to do anything with kitties, loadfile("kitties.csv") never gets called.

有一些标准的方法可以做到这一点吗?

试了一下之后,我产生了以下解决方案,该解决方案似乎正常工作,并且很简短.有其他选择吗?我应该牢记使用这种方法是否有缺点?

After playing around with it for a bit, I produced the following solution, which appears to work correctly and is quite brief. Are there some alternatives? Are there drawbacks to using this approach that I should keep in mind?

class lazyload:
    def __init__(self,FILE):
        self.FILE = FILE
        self.F = None
    def __getattr__(self,name):
        if not self.F: 
            print "loading %s" % self.FILE
            self.F = loadfile(self.FILE)
        return object.__getattribute__(self.F, name)

如果这样的方法行得通,那就更好了:

What might be even better is if something like this worked:

class lazyload:
    def __init__(self,FILE):
        self.FILE = FILE
    def __getattr__(self,name):
        self = loadfile(self.FILE) # this never gets called again
                                   # since self is no longer a
                                   # lazyload instance
        return object.__getattribute__(self, name)

但是这不起作用,因为self是本地的.实际上,每次您执行任何操作时,它最终都会调用loadfile.

But this doesn't work because self is local. It actually ends up calling loadfile every time you do anything.

推荐答案

Python stdlibrary中的csv模块只有在开始对其进行迭代之前才会加载数据,因此它实际上是惰性的.

The csv module in the Python stdlibrary will not load the data until you start iterating over it, so it is in fact lazy.

如果您需要通读整个文件以构建数据结构,那么使用复杂的惰性加载对象作为代理是太过分了.只需这样做:

If you need to read through the whole file to build the datastructure, having a complex Lazy load object that proxies things is overkill. Just do this:

class Lazywrapper(object):
    def __init__(self, filename):
        self.filename = filename
        self._data = None

    def get_data(self):
        if self._data = None:
            self._build_data()
        return self._data

    def _build_data(self):
        # Now open and iterate over the file to build a datastructure, and
        # put that datastructure as self._data

通过上述课程,您可以执行以下操作:

With the above class you can do this:

puppies = Lazywrapper("puppies.csv") # Instant
kitties = Lazywrapper("kitties.csv") # Instant

print len(puppies.getdata()) # Wait
print puppies.getdata()[32] # instant

allkitties = kitties.get_data() # wait
print len(allkitties)
print kitties[32]

如果您有很多数据,并且您实际上不需要加载所有数据,则还可以实现类似class之类的东西,它将读取文件,直到找到名为"Froufrou"的狗狗. "然后停止,但是到那时最好将数据一劳永逸地保存在数据库中并从那里访问.

If you have a lot of data, and you don't really need to load all the data you could also implement something like class that will read the file until it finds the doggie called "Froufrou" and then stop, but at that point it's likely better to stick the data in a database once and for all and access it from there.

这篇关于如何延迟加载数据结构(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆