如何计算R中的大量相异矩阵 [英] how to calculate massive dissimilarity matrix in R

查看：144 发布时间：2020/10/3 2:19:44 r cluster-analysis

本文介绍了如何计算R中的大量相异矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在对大约30k行的一些大数据进行聚类，差异矩阵对于R来说太大了，我认为这并不是纯粹的内存大小问题。也许有一些聪明的方法可以做到这一点？

I am currently working on clustering some big data, about 30k rows, the dissimilarity matrix just too big for R to handle, I think this is not purely memory size problem. Maybe there are some smart way to do this?

推荐答案

如果您的数据太大以至于基R无法轻松应对，那么您有几种选择：

If your data is so large that base R can't easily cope, then you have several options:

在具有更多RAM的计算机上工作。

使用商用产品，例如Revolution Analytics支持使用R处理更大的数据。

这里是使用 RevoScaleR Revolution提供的商业软件包。我使用数据集钻石，它是 ggplot2 的一部分，因为它包含53K行，即比您的数据大一点。该示例没有太大的分析意义，因为我天真地将因子转换为数字，但是它说明了在笔记本电脑上的计算：


Here is an example using RevoScaleR the commercial package by Revolution.  I use the dataset diamonds, part of ggplot2 since this contains 53K rows, i.e. a bit larger than your data.  The example doesn't make much analytic sense, since I naively convert factors into numerics, but it illustrates the computation on a laptop:
library(ggplot2)
library(RevoScaleR)
artificial <- as.data.frame(sapply(diamonds, as.numeric))
clusters <- rxKmeans(~carat + cut + color + clarity + price, 
                     data=artificial, numClusters=6)
clusters$centers

这将导致：
      carat      cut    color  clarity      price
1 0.3873094 4.073170 3.294146 4.553910   932.6134
2 1.9338503 3.873151 4.285970 3.623935 16171.7006
3 1.0529018 3.655348 3.866056 3.135403  4897.1073
4 0.7298475 3.794888 3.486457 3.899821  2653.7674
5 1.2653675 3.879387 4.025984 4.065154  7777.0613
6 1.5808225 3.904489 4.066285 4.066285 11562.5788


                        这篇关于如何计算R中的大量相异矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何计算R中的大量相异矩阵 [英] how to calculate massive dissimilarity matrix in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何计算R中的大量相异矩阵 [英] how to calculate massive dissimilarity matrix in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭