分成几类:混蛋vs kmeans [英] Partition into classes: jenks vs kmeans

查看：86 发布时间：2020/11/30 4:33:57 r intervals

本文介绍了分成几类:混蛋vs kmeans的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将向量(长度约为10 ^ 5)划分为五个类.使用包classInt中的函数classIntervals时，我想使用style = "jenks"自然中断，但是即使对于较小的向量(仅500)，这也会花费大量时间.设置style = "kmeans"几乎立即执行.

I want to partition a vector (length around 10^5) into five classes. With the function classIntervals from package classInt I wanted to use style = "jenks" natural breaks but this takes an inordinate amount of time even for a much smaller vector of only 500. Setting style = "kmeans" executes almost instantaneously.

library(classInt)

my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)

system.time(classIntervals(x, n = 5, style = "jenks"))
R> system.time(classIntervals(x, n = 5, style = "jenks"))
   user  system elapsed 
  13.46    0.00   13.45 

system.time(classIntervals(x, n = 5, style = "kmeans"))
R> system.time(classIntervals(x, n = 5, style = "kmeans"))
   user  system elapsed 
   0.02    0.00    0.02

是什么让Jenks算法如此缓慢，并且有更快的方法来运行它?

What makes the Jenks algorithm so slow, and is there a faster way to run it?

如果需要，我将把问题的最后两部分移到stats.stackexchange.com:

If need be I will move the last two parts of the question to stats.stackexchange.com:

在什么情况下kmeans是Jenks的合理替代品?
通过在随机的1％数据点子集上运行classInt来定义类是否合理?

推荐答案

要回答您的原始问题:

是什么让Jenks算法如此缓慢，并且有更快的方法运行它吗?

What makes the Jenks algorithm so slow, and is there a faster way to run it?

实际上，与此同时，还有一种更快的方法来应用Jenks算法，即BAMMtools软件包中的setjenksBreaks函数.

Indeed, meanwhile there is a faster way to apply the Jenks algorithm, the setjenksBreaks function in the BAMMtools package.

但是，请注意，必须将中断次数设置为不同，即，如果在classInt包的classIntervals函数中将中断次数设置为5，则必须将中断次数设置为6，而BAMMtools包中的>函数以获取相同的结果.

However, be aware that you have to set the number of breaks differently, i.e. if you set the breaks to 5 in the the classIntervals function of the classInt package you have to set the breaks to 6 the setjenksBreaks function in the BAMMtools package to get the same results.

# Install and load library
install.packages("BAMMtools")
library(BAMMtools)

# Set up example data
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)

# Apply function
getJenksBreaks(x, 6)

速度极大，即

> microbenchmark( getJenksBreaks(x, 6, subset = NULL),  classIntervals(x, n = 5, style = "jenks"), unit="s", times=10)
Unit: seconds
                                      expr         min          lq        mean      median          uq         max neval cld
       getJenksBreaks(x, 6, subset = NULL) 0.002824861 0.003038748 0.003270575 0.003145692 0.003464058 0.004263771    10  a 
 classIntervals(x, n = 5, style = "jenks") 2.008109622 2.033353970 2.094278189 2.103680325 2.111840853 2.231148846    10

这篇关于分成几类:混蛋vs kmeans的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

分成几类:混蛋vs kmeans [英] Partition into classes: jenks vs kmeans

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

分成几类:混蛋vs kmeans [英] Partition into classes: jenks vs kmeans

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭