使用内存极大增加 [英] Enormous Increase In the Use Of Memory

查看:159
本文介绍了使用内存极大增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码,我需要创建一个关键值为double(两个集群之间的f检验的值,我需要计算残差平方和)和映射的cluspair这是我创建的类Cluster的对。地图旨在存储所有集群之间的F检验值,这样我不需要在每一步中一遍又一遍地进行计算。 BTW集群是一种树结构,其中每个集群包含两个子集群,存储的值是70维向量。

I've got a code where I need to create a map with key values as double (value of the f-test between two clusters. I need to calculate the residual sum of squares for this) and the mapped value of cluspair which is pair of the class Cluster that I created. Map aims to store the F-test values between the all clusters so that I would not need to do the calculation again and again in every step. BTW cluster is a tree structure where every cluster contains two subclusters and the stored values are 70-dimensional vectors.

问题是,为了计算RSS,实现递归代码,其中我需要找到集群的每个元素的距离与集群的均值,这似乎消耗了大量的内存。当我创建相同的地图,键值是两个集群之间的简单距离时,程序使用最少的内存,因此我认为内存使用的增加是由递归函数RSS的调用引起的。在下面的代码中我应该怎么做来管理内存使用?在当前的实现中,系统内存不足,窗口关闭应用程序,说系统耗尽了虚拟内存。

Problem is, in order to calculate the RSS, I need to implement a recursive code where I need to find the distance of every element of the cluster with the mean of the cluster and this seems to be consuming an enormous amount of memory. When I create the same map with the key values being the simple distance between the means of two clusters, the program uses minimal memory so I think the increase in the memory use is caused by the call of the recursive function RSS. What should I do to manage the memory use in the code below? In its current implementation the system runs out of memory and windows closes the application saying that the system ran out of virtual memory.

主代码:

    map<double,cluspair> createRSSMap( list<Cluster*> cluslist )
    {
            list<Cluster*>::iterator it1;
            list<Cluster*>::iterator it2;

            map<double,cluspair> rtrnmap;


            for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
            {
                it2=it1;
                ++it2;
                cout << ".";

                list<Cluster*>::iterator itc;
                double cFvalue=10000000000000000000;
                double rIt1 = (*it1)->rss();

                for(int kk=0 ; it2!=cluslist.end(); it2++)
                {

                    Cluster tclustr ((*it1) , (*it2));
                    double r1 = tclustr.rss();
                    double r2= rIt1 + (*it2)->rss();
                    int df2 = tclustr.getNumOfVecs() - 2;

                    double fvalue = (r1 - r2) / (r2 / df2);

                    if(fvalue<cFvalue)
                    {
                        cFvalue=fvalue;
                        itc=it2;
                    }
                }


                cluspair clp;
                clp.c1 = *it1;
                clp.c2 = *itc;


                bool doesexists = (rtrnmap.find(cFvalue) != rtrnmap.end());

                while(rtrnmap)
                {
                    cFvalue+= 0.000000001;
                    rtrnmap= (rtrnmap.find(cFvalue) != rtrnmap.end());
                }

                rtrnmap[cFvalue] = clp;


            }

            return rtrnmap;
    }

以及函数RSS的实现:

and the imlementation of the function RSS:

double Cluster::rss()
{
    return rss(cnode->mean);
}

double Cluster::rss(vector<double> &cmean)
{
    if(cnode->numOfVecs==1)
    {
        return vectorDist(cmean,cnode->mean);
    }
    else
    {
        return ( ec1->rss(cmean) + ec2->rss(cmean) );       
    }
}

提前非常感谢。我现在真的不知道该怎么做。

Much thanks in advance. I really don't know what to do at this point.

下面是我用来创建映射与键是两个集群平均值之间的简单欧几里得距离。正如我上面所说,它是非常相似,使用最小的内存。它只有在f值的计算不同。代替递归计算,存在两个簇的均值的简单距离的计算。希望它有助于识别问题

below is the code with that I use to create a map with keys being simple euclidian distance between two cluster means. As I've said above, it is quite similar and uses minimal memory. It only differs in the calculation of the fvalue. Instead of the recursive calculation, there is the calculation of simple distance of means of two clusters. Hope it helps to identify the problem

map<double,cluspair> createDistMap( list<Cluster*> cluslist )
{
        list<Cluster*>::iterator it1;
        list<Cluster*>::iterator it2;

        map<double,cluspair> rtrnmap;


        for(it1=cluslist.begin(); it1!= --cluslist.end() ;it1++)
        {
            it2=it1;
            ++it2;
            cout << ".";

            list<Cluster*>::iterator itc;
            double cDist=1000000000000000;

            for(int kk=0 ; it2!=cluslist.end(); it2++)
            {
                double nDist = vectorDist( (*it1)->getMean(),(*it2)->getMean());
                if (nDist<cDist)
                {
                    cDist = nDist;
                    itc=it2;
                }
            }   

            cluspair clp;
            clp.c1 = *it1;
          clp.c2 = *itc;



            bool doesexists = (rtrnmap.find(cDist) != rtrnmap.end());

            while(doesexists)
            {
                cDist+= 0.000000001;
                doesexists  = (rtrnmap.find(cDist) != rtrnmap.end());
            }

            rtrnmap[cDist] = clp;

        }

        return rtrnmap;
}







implementation of vectorDist()

double vectorDist(vector<double> vec1, vector<double> vec2)
{

    double sqrsum=0;
    double tempd=0;

    int vs = vec1.size();

    for ( int i=0;i<vs;i++)
    {
        tempd = vec1[i] - vec2[i];
        sqrsum += tempd*tempd;
    }

    return sqrsum;
}

编辑

BTW我试过这个替代实现仍然无法控制内存使用

BTW I've tried this alternative implementation which still fails to control the memory usage

double Cluster::rss()
{
    list<double> fvals;
    rss(cnode->mean , fvals);

    double sum=0;
    list<double>::iterator tpit;
    for(tpit=fvals.begin() ; tpit != fvals.end() ; ++tpit)
    {
        sum += *tpit;
    }
    return sum;
}

void Cluster::rss(vector<double> &cmean , list<double> &fvals)
{
    if(cnode->numOfVecs==1)
    {
        fvals.push_back( vectorDist(cmean,cnode->mean) );
    }
    else
    {
        ec1->rss(cmean , fvals);
        ec2->rss(cmean , fvals);        
    }
}


推荐答案

你的内存不足,你有一个非常深的树,或者你的集群对象是大的,或两者兼而有之。尝试创建另一个具有与您的Cluster树相同拓扑的双精度树数据结构,并将其称为RSS树以保存RSS值。计算底部节点的rss值,然后递归填充RSS树中的其余值 这样,在执行rss计算时,您不会在内存中保存集群对象。

If you're running out of memory you have a very deep tree or your Cluster objects are large or both. Try creating another tree data structure of doubles with the same topology as your Cluster tree and call it RSS tree to hold the RSS values. Calculate the bottom nodes' rss values and then recursively fill out the rest of the values in the RSS tree. This way you aren't holding the cluster objects in memory while you do the rss calculation.

这篇关于使用内存极大增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆