减少NbClust的内存使用量 [英] Reducing NbClust memory usage

查看：361 发布时间：2020/5/8 20:03:07 r memory cluster-analysis matrix-multiplication

本文介绍了减少NbClust的内存使用量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一些有关NbClust函数大量使用内存的帮助. 根据我的数据，内存膨胀到56GB，这时R发生致命错误而崩溃.使用debug()，我能够将错误跟踪到这些行:

I need some help with massive usage of memory by the NbClust function. On my data, memory balloons to 56GB at which point R crashes with a fatal error. Using debug(), I was able to trace the error to these lines:

            if (any(indice == 23) || (indice == 32)) {
                res[nc - min_nc + 1, 23] <- Index.sPlussMoins(cl1 = cl1, 
                    md = md)$gamma

对Index.sPlussMoins的调试显示，崩溃发生在for循环期间.它崩溃的迭代次数各不相同，并且在循环期间内存使用量在41到57Gb之间变化(我总共有64个):

Debugging of Index.sPlussMoins revealed that the crash happens during a for loop. The iteration that it crashes at varies, and during the loop memory usage varies between 41 and 57Gb (I have 64 total):

    for (k in 1:nwithin1) {
      s.plus <- s.plus + (colSums(outer(between.dist1, 
                                        within.dist1[k], ">")))
      s.moins <- s.moins + (colSums(outer(between.dist1, 
                                          within.dist1[k], "<")))
      print(s.moins)
    }

我猜测内存使用量来自outer()函数. 我可以修改NbClust使其具有更高的内存效率吗(也许使用bigmemory包)? 至少，最好让R以无法分配大小向量..."退出函数，而不是崩溃.这样一来，我就会知道需要多大的内存来处理导致崩溃的矩阵.

I'm guessing that the memory usage comes from the outer() function. Can I modify NbClust to be more memory efficient (perhaps using the bigmemory package)? At very least, it would be nice to get R to exit the function with an "cannot allocate vector of size..." instead of crashing. That way I would have an idea of just how much more memory I need to handle the matrix causing the crash.

我创建了一个最小的示例，该示例的矩阵近似于我正在使用的矩阵的大小，尽管当调用hclust函数时，它崩溃于另一个点:

I created a minimal example with a matrix the approximate size of the one I am using, although now it crashes at a different point, when the hclust function is called:

set.seed(123)

cluster_means = sample(1:25, 10)
mlist = list()
for(cm in cluster_means){
  name = as.character(cm)
  m = data.frame(matrix(rnorm(60000*60,mean=cm,sd=runif(1, 0.5, 3.5)), 60000, 60))
  mlist[[name]] = m
}

test_data = do.call(cbind, cbind(mlist))

library(NbClust)
debug(fun = "NbClust")
nbc = NbClust(data = test_data, diss = NULL, distance = "euclidean", min.nc = 2, max.nc = 30, 
              method = "ward.D2", index = "alllong", alphaBeale = 0.1)

debug: hc <- hclust(md, method = "ward.D2")

它似乎在耗尽可用内存之前就崩溃了(根据我的系统监视器，当崩溃总数达到64时，正在使用34Gb.

It seems to crash before using up available memory (according to my system monitor, 34Gb is being used when it crashes out of 64 total.

那么，在没有对可管理大小的矩阵进行二次采样的情况下，有什么方法可以做到这一点?如果我做了，我如何知道给定大小的矩阵需要多少内存?我本以为我的64Gb足够了.

So is there any way I can do this without sub-sampling manageable sized matrices? And if I did, how do I know how much memory I will need for a matrix of a given size? I would have thought my 64Gb would be enough.

我尝试将NbClust更改为使用fastcluster而不是统计信息版本.它没有崩溃，但退出并出现内存错误:

I tried altering NbClust to use fastcluster instead of the stats version. It didn't crash, but did exit with a memory error:

Browse[2]> 
exiting from: fastcluster::hclust(md, method = "ward.D2")
Error: cannot allocate vector of size 9.3 Gb

减少NbClust的内存使用量 [英] Reducing NbClust memory usage

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

减少NbClust的内存使用量 [英] Reducing NbClust memory usage

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭