Cython:如何移动大对象而不复制它们? [英] Cython: How to move large objects without copying them?

查看:112
本文介绍了Cython:如何移动大对象而不复制它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Cython来包装C ++代码,并将它暴露给Python进行交互工作。我的问题是,我需要从文件读取大图(几千兆字节),他们最终在内存中两次。任何人都可以帮助我诊断和解决这个问题?



我的图形类的Cython包装器如下:

  cdef extern from../src/graph/Graph.h:
cdef cppclass _GraphGraph:
_Graph()except +
_Graph(count)except +
计数numberOfNodes()除+
计数numberOfEdges()除了+


cdef类图:
一个无向,可选加权图形
cdef _Graph _this

def __cinit __(self,n = None):
如果n不是None:
self._this = _Graph(n)

#any _thisect作为返回类型出现需要实现setThis
cdef setThis(self,_Graph other):
#del self._this
self._this = other
return self

def numberOfNodes(self):
return self._this.numberOfNodes()

def numberOfEdges(self):
return self。 _this.numberOfEdges()

如果需要返回Python图形,那么 setThis 方法用于设置本机 _Graph 实例。例如,当从文件中读取 Graph 时,会发生这种情况。这是这个类的工作:

  cdef extern从../src/io/METISGraphReader.h:
cdef cppclass _METISGraphReaderMETISGraphReader:
_METISGraphReader()except +
_Graph read(string path)except +

cdef class METISGraphReader:
METIS邻接文件格式[1]
[1]:http://people.sc.fsu.edu/~jburkardt/data/metis_graph/metis_graph.html

cdef _METISGraphReader _this

def read(self,path):
pathbytes = path.encode(utf-8)#字符串需要转换为字节,强制转换为std :: string
return Graph(0).setThis(self._this.read(pathbytes))

互动用法如下:

 >>> G = graphio.METISGraphReader()。read(giant.metis.graph)

从文件完成和使用X GB内存,有一个阶段,显然复制发生,然后使用2X GB内存。调用 del G 时释放整个内存。



我的错误导致图形被复制

解决方案

我没有一个确定的答案,但我有一个理论。 p>

你写的Cython包装器是不寻常的,因为它们直接包装C ++对象而不是指向它。



以下代码特别低效:

  cdef setThis(self,_Graph other):
self._this =其他
return self

原因是您的 _Graph 类包含几个STL向量,那些将必须被复制。所以,当你的其他对象被分配给 self._this 内存使用有效地加倍(或更糟糕, STL分配器可能由于性能原因而过度分配)。



我写了一个简单的测试,匹配你的,并添加日志记录到处查看对象如何创建,复制或销毁。我找不到任何问题。复制确实发生,但是分配完成后,我看到只有一个对象保留。



所以我的理论是,你看到的额外内存与STL分配器向量中的逻辑。所有额外的内存必须在副本之后附加到最终对象。



我的建议是你切换到更标准的基于指针的包装。您的 _Graph 包装应该或多或少地定义如下:

  cdef class Graph:
一个无向,可选加权图形
cdef _Graph * _this

def __cinit __(self,n = None):
如果n不是无:
self._this = new _Graph(n)
else:
self._this = 0

cdef setThis(self,_Graph * other ):
del self._this
self._this = other
return self

def __dealloc __(self):
del self._this

注意,我需要删除 _this 指针。



然后您需要修改 METISGraphReader :: read()方法, code>图。此方法的原型应更改为:

 图* METISGraphReader :: read(std :: string path); 

那么Cython封装器可以写成:

  def read(self,path):
pathbytes = path.encode(utf-8)#string需要转换为字节, to std :: string
return Graph()。setThis(self._this.read(pathbytes))


$ b b

如果这样做,只有一个对象,由 read()在堆上创建的对象。指向该对象的指针返回到 read() Cython包装器,然后将其安装在全新的 Graph()实例。唯一可以复制的是指针的4或8个字节。



我希望这有助于!


I use Cython to wrap C++ code and expose it to Python for interactive work. My problem is that I need to read large graphs (several gigabytes) from file and they end up twice in the memory. Can anyone help me diagnose and solve this problem?

My Cython wrapper for the graph class looks like this:

cdef extern from "../src/graph/Graph.h":
    cdef cppclass _Graph "Graph":
        _Graph() except +
        _Graph(count) except +
        count numberOfNodes() except +
        count numberOfEdges() except +


cdef class Graph:
    """An undirected, optionally weighted graph"""
    cdef _Graph _this

    def __cinit__(self, n=None):
        if n is not None:
            self._this = _Graph(n)

    # any _thisect which appears as a return type needs to implement setThis
    cdef setThis(self, _Graph other):
        #del self._this
        self._this = other
        return self

    def numberOfNodes(self):
        return self._this.numberOfNodes()

    def numberOfEdges(self):
        return self._this.numberOfEdges()

If a Python Graph needs to be returned, it needs to be created empty and then the setThis method is used to set the native _Graph instance. This happens, for example, when a Graph is read from file. This is the job of this class:

cdef extern from "../src/io/METISGraphReader.h":
    cdef cppclass _METISGraphReader "METISGraphReader":
        _METISGraphReader() except +
        _Graph read(string path) except +

cdef class METISGraphReader:
    """ Reads the METIS adjacency file format [1]
        [1]: http://people.sc.fsu.edu/~jburkardt/data/metis_graph/metis_graph.html
    """
    cdef _METISGraphReader _this

    def read(self, path):
        pathbytes = path.encode("utf-8") # string needs to be converted to bytes, which are coerced to std::string
        return Graph(0).setThis(self._this.read(pathbytes))

Interactive usage looks like this:

 >>> G = graphio.METISGraphReader().read("giant.metis.graph")

After the reading from file is done and X GB memory are used, there is a phase where obviously copying happens, and after that 2X GB memory are used. The entire memory is freed when del G is called.

Where is my error which leads to the graph being copied and existing twice in memory?

解决方案

I don't have a definitive answer for you, but I have a theory.

The Cython wrappers that you wrote are unusual, in that they wrap the C++ object directly instead of a pointer to it.

The following code is particularly inefficient:

cdef setThis(self, _Graph other):
    self._this = other
    return self 

The reason is that your _Graph class contains several STL vectors, and those will have to be copied over. So, when your other object is assigned to self._this the memory usage is effectively doubled (or worse, since the STL allocators can overallocate for performance reasons).

I wrote a simple test that matches yours and added logging everywhere to see how objects are created, copied or destroyed. I can't find any issues there. The copies do happen, but after the assignment is complete I see that only one object remains.

So my theory is that the extra memory that you see is related to STL allocator logic in the vectors. All that extra memory must be attached to the final object after the copies.

My recommendation is that you switch to the more standard pointer based wrapping. Your _Graph wrapper then should be defined more or less as follows:

cdef class Graph:
    """An undirected, optionally weighted graph"""
    cdef _Graph* _this

    def __cinit__(self, n=None):
        if n is not None:
            self._this = new _Graph(n)
        else:
            self._this = 0

    cdef setThis(self, _Graph* other):
        del self._this
        self._this = other
        return self

    def __dealloc__(self):
        del self._this

Note that I need to delete _this because it is a pointer.

You will then need to modify your METISGraphReader::read() method to return a heap allocated Graph. The prototype of this method should be changed to:

Graph* METISGraphReader::read(std::string path);

Then the Cython wrapper for it can be written as:

    def read(self, path):
        pathbytes = path.encode("utf-8") # string needs to be converted to bytes, which are coerced to std::string
        return Graph().setThis(self._this.read(pathbytes))

If you do it this way there is only one object, the one that is created on the heap by read(). A pointer to that object is returned to the read() Cython wrapper, which then installs it in a brand new Graph() instance. The only thing that gets copied is the 4 or 8 bytes of the pointer.

I hope this helps!

这篇关于Cython:如何移动大对象而不复制它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆