动态向量的特征库内存使用 [英] Eigen library memory usage for dynamic vectors

查看:240
本文介绍了动态向量的特征库内存使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二进制文件存储float32对象(9748422 * 5)。从这样一个集合(190MB大致大小),我创建了一套本征:: VectorXd 向量(每5组件),从而将它们的9748422。底层类型是 double ,因此大约是存储它们的输入大小的两倍。



但是,运气好的话,这个过程总共需要2.5GB。这是 PROCESS_MEMORY_COUNTERS 的日志:

  PageFaultCount:0x000A3C40 
PeakWorkingSetSize:0xA3C42000
WorkingSetSize:0xA3C42000
QuotaPeakPagedPoolUsage:0x00004ED8
QuotaPagedPoolUsage:0x00004ED8
QuotaPeakNonPagedPoolUsage:0x000057A8
QuotaNonPagedPoolUsage:0x000057A8
PagefileUsage:0xA3A9B000
PeakPagefileUsage:0xA3A9B000

我已经追踪本征的内部分配,这确实似乎分配完全相同大小我在纸上计算。然而,Eigen使用aligned_alloc它的大多数动态向量。这是否会产生这种数量的破坏?如果没有想到,你可以推荐另外的地方去寻找的,为什么会这样一个问题?



我不能提供一个编译(在线)CPP的例子,但这里的我做的一个骨架:

  struct SSCCE_struct 
{
Eigen :: VectorXd m_data;
};

typedef std :: vector< SSCCE_struct *>电视机

int main(int argc,char * argv [])
{
TVector outputVertices;
HANDLE bpcHandle;
bpcHandle = CreateFileA的(D:\\sample.bpc,
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);

LARGE_INTEGER len_li;
GetFileSizeEx(bpcHandle,& len_li);
INT64 len = len_li.QuadPart; // (len_li.u.HighPart << 32)| len_li.u.LowPart;

无符号长整数noPoints = len / 20;
unsigned long noPointsRead = 0;
unsigned long long currPointIdx = 0;

outputVertices.resize(noPoints);

DebugTrace(No points%lu \\\
,noPoints);

float buffer [5 * 1024];
DWORD noBytesRead = 0;
do
{
ReadFile(bpcHandle,buffer,sizeof(buffer),& noBytesRead,NULL);
noPointsRead = noBytesRead / 20;
为(无符号长IDX = 0; IDX< noPointsRead ++ IDX)
{
outputVertices [currPointIdx + IDX] =新SSCCE_struct();

outputVertices [currPointIdx + idx] - > m_data.resize(5);

表示(无符号KDX = 0; KDX小于5 ++ KDX)
{
outputVertices [currPointIdx + IDX] - GT; M_DATA [KDX] =缓冲液[5 * idx + kdx];
}
}

currPointIdx + = noPointsRead;

} while(noBytesRead);


CloseHandle(bpcHandle);
}
}

稍后编辑



我执行了David的回答中指出的测试,解决方案是完全避免动态分配。有几种组合,可以尝试一下这里的结果为所有这些:



1

  struct SSCCE_struct 
{
Eigen :: Matrix< double,1,5> m_data;
};

typedef std :: vector< SSCCE_struct *>电视机

产生1.4 GB(1.1 GB废弃物)



2。

  struct SSCCE_struct 
{
Eigen :: VectorXd m_data;
};

typedef std :: vector< SSCCE_struct *>电视机

产生2.5 GB(2.2 GB浪费)



3。

  struct SSCCE_struct 
{
Eigen :: Matrix< double,1,5& m_data;
};

typedef std :: vector< SSCCE_struct>电视机

产生381 GB(40 MB的浪费 - 完全合理,也许是可预测的) p>

解决方案

这里有很多指针,每个指针都有分配开销。指针指的是小对象,因此开销很大。



除此之外,动态分配的对象必然比固定大小的对象有更多的开销。这是因为固定大小的对象不需要存储矩阵维度。



以下是指针开销的来源:


  1. Eigen :: VectorXd 使用动态分配的存储。这意味着一个指针。

  2. 将对象存储在 std :: vector< SSCCE_struct *> 中。

存储这些对象的最有效的方法是删除间接。您可以切换到:


  1. Matrix< double,5,1> 。这是一个固定大小的对象,因此没有间接。此外,如上所述,它不需要在运行时存储矩阵维度,因为它们在编译时是已知的。对于这样的小物体是重要的。

  2. 将对象存储在 std :: vector< SSCCE_struct> 中。


  3. 通过这些更改,程序的内存使用量在使用发布设置编译时会丢失到383MB在我的机器上。这更符合你的期望。



    最大的区别似乎在 Eigen :: VectorXd 固定大小的对象。如果我使用 Eigen :: VectorXd std :: vector< SSCCE_struct> ,则内存使用跳转到918MB。当我然后去 std :: vector< SSCCE_struct *> 时,它会进一步跳到1185MB。



    这些测量将高度依赖于编译器。我使用VS2013编译32位代码。


    I have a binary file storing float32 objects (9748422*5 of them). From such a collection (190MB roughly in size), I'm creating a set of Eigen::VectorXd vectors (each with 5 components), thus 9748422 of them. The underlying type is double, hence roughly double the input size for storing them.

    But, as luck has it, the process requires a total of 2.5GB. This is a log of the PROCESS_MEMORY_COUNTERS:

        PageFaultCount: 0x000A3C40
        PeakWorkingSetSize: 0xA3C42000
        WorkingSetSize: 0xA3C42000
        QuotaPeakPagedPoolUsage: 0x00004ED8
        QuotaPagedPoolUsage: 0x00004ED8
        QuotaPeakNonPagedPoolUsage: 0x000057A8
        QuotaNonPagedPoolUsage: 0x000057A8
        PagefileUsage: 0xA3A9B000
        PeakPagefileUsage: 0xA3A9B000
    

    I've tracked Eigen's internal allocator, and it indeed seems to "allocate" exactly the size I compute on paper. However, Eigen uses aligned_alloc for most of its dynamic vectors. Could this be generating this amount of havoc? If nothing comes to mind, could you recommend another place to look for an issue of why this is happening?

    I cannot provide a compilable (online) cpp example, but here's a skeleton of what I'm doing:

    struct SSCCE_struct
    {
        Eigen::VectorXd m_data;
    };
    
    typedef std::vector<SSCCE_struct*> TVector;
    
    int main(int argc, char* argv[])
    {
        TVector outputVertices;
        HANDLE bpcHandle;
        bpcHandle = CreateFileA("D:\\sample.bpc",              
            GENERIC_READ,          
            FILE_SHARE_READ,       
            NULL,                 
            OPEN_EXISTING,        
            FILE_ATTRIBUTE_NORMAL, 
            NULL);                 
    
        LARGE_INTEGER  len_li;
        GetFileSizeEx (bpcHandle, &len_li);
        INT64 len = len_li.QuadPart; //(len_li.u.HighPart << 32) | len_li.u.LowPart;
    
        unsigned long long noPoints = len / 20;
        unsigned long noPointsRead = 0;
        unsigned long long currPointIdx = 0;
    
        outputVertices.resize( noPoints );
    
        DebugTrace( "No points %lu \n", noPoints );
    
        float buffer[ 5 * 1024 ];
        DWORD noBytesRead = 0;
        do 
        {
            ReadFile(bpcHandle, buffer, sizeof(buffer), &noBytesRead, NULL);
            noPointsRead = noBytesRead / 20;
            for (unsigned long idx = 0; idx < noPointsRead; ++idx )
            {
                outputVertices[ currPointIdx + idx ] = new SSCCE_struct();
    
                outputVertices[ currPointIdx + idx ]->m_data.resize(5);
    
                for (unsigned kdx = 0; kdx < 5; ++kdx)
                {
                    outputVertices[ currPointIdx + idx ]->m_data[ kdx ] = buffer[ 5 * idx + kdx ];
                }
            }
    
            currPointIdx += noPointsRead;
    
        } while (noBytesRead);
    
    
        CloseHandle(bpcHandle);
    }
    }
    

    Later edit:

    I performed the test indicated in David's answer and the solution is to avoid dynamic allocations altogether. There are several combinations one can try out and here's the results for all of these:

    1.

    struct SSCCE_struct
    {
        Eigen::Matrix<double,1,5> m_data;
    };
    
    typedef std::vector<SSCCE_struct*> TVector;
    

    Yielding 1.4 GB (1.1 GB waste)

    2.

     struct SSCCE_struct
     {
        Eigen::VectorXd m_data;
     };
    
     typedef std::vector< SSCCE_struct* > TVector;
    

    Yielding 2.5 GB (2.2 GB waste)

    3.

    struct SSCCE_struct
    {
        Eigen::Matrix<double,1,5> m_data;
    };
    
    typedef std::vector<SSCCE_struct> TVector;
    

    Yielding 381 GB (with 40 MB of waste - totally reasonable and, perhaps, predictable).

    解决方案

    You've got a lot of pointers here, and each pointer has allocation overhead. The pointers refer to small objects, and so the overhead is significant.

    On top of that, dynamically allocated objects necessarily have more overhead than fixed size objects. That's because fixed size objects do not need to store matrix dimensions.

    Here are the sources of your pointer overhead:

    1. Eigen::VectorXd uses dynamically allocated storage. That means a pointer.
    2. You store the objects in std::vector<SSCCE_struct*>. And that's another pointer, with overhead.

    The most efficient way to store these objects is to remove the indirection. You can do that by switching to:

    1. Matrix<double, 5, 1>. This is a fixed size object and so has no indirection. What's more, as explained above, it does not need to store the matrix dimensions at runtime because they are known at compile time. For such a small object that is significant.
    2. Store the objects in std::vector<SSCCE_struct>. Again, you lose one level of indirection.

    With these changes, the memory usage of your program, when compiled with release settings, drops to 383MB on my machine. That's much more in line with your expectations.

    The big difference seems to be between Eigen::VectorXd and the fixed size object. If I use Eigen::VectorXd and std::vector<SSCCE_struct> then the memory usage jumps to 918MB. When I then go to std::vector<SSCCE_struct*> it makes a further jump to 1185MB.

    These measurements will be highly dependent on the compiler. I've used VS2013 compiling 32 bit code.

    这篇关于动态向量的特征库内存使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆