分成几类:混蛋vs kmeans [英] Partition into classes: jenks vs kmeans

查看:86
本文介绍了分成几类:混蛋vs kmeans的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将向量(长度约为10 ^ 5)划分为五个类.使用包classInt中的函数classIntervals时,我想使用style = "jenks"自然中断,但是即使对于较小的向量(仅500),这也会花费大量时间.设置style = "kmeans"几乎立即执行.

I want to partition a vector (length around 10^5) into five classes. With the function classIntervals from package classInt I wanted to use style = "jenks" natural breaks but this takes an inordinate amount of time even for a much smaller vector of only 500. Setting style = "kmeans" executes almost instantaneously.

library(classInt)

my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)

system.time(classIntervals(x, n = 5, style = "jenks"))
R> system.time(classIntervals(x, n = 5, style = "jenks"))
   user  system elapsed 
  13.46    0.00   13.45 

system.time(classIntervals(x, n = 5, style = "kmeans"))
R> system.time(classIntervals(x, n = 5, style = "kmeans"))
   user  system elapsed 
   0.02    0.00    0.02

是什么让Jenks算法如此缓慢,并且有更快的方法来运行它?

What makes the Jenks algorithm so slow, and is there a faster way to run it?

如果需要,我将把问题的最后两部分移到stats.stackexchange.com:

If need be I will move the last two parts of the question to stats.stackexchange.com:

  • 在什么情况下kmeans是Jenks的合理替代品?
  • 通过在随机的1%数据点子集上运行classInt来定义类是否合理?

推荐答案

要回答您的原始问题:

是什么让Jenks算法如此缓慢,并且有更快的方法 运行它吗?

What makes the Jenks algorithm so slow, and is there a faster way to run it?

实际上,与此同时,还有一种更快的方法来应用Jenks算法,即BAMMtools软件包中的setjenksBreaks函数.

Indeed, meanwhile there is a faster way to apply the Jenks algorithm, the setjenksBreaks function in the BAMMtools package.

但是,请注意,必须将中断次数设置为不同,即,如果在classInt包的classIntervals函数中将中断次数设置为5,则必须将中断次数设置为6,而BAMMtools包中的>函数以获取相同的结果.

However, be aware that you have to set the number of breaks differently, i.e. if you set the breaks to 5 in the the classIntervals function of the classInt package you have to set the breaks to 6 the setjenksBreaks function in the BAMMtools package to get the same results.

# Install and load library
install.packages("BAMMtools")
library(BAMMtools)

# Set up example data
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)

# Apply function
getJenksBreaks(x, 6)

速度极大,即

> microbenchmark( getJenksBreaks(x, 6, subset = NULL),  classIntervals(x, n = 5, style = "jenks"), unit="s", times=10)
Unit: seconds
                                      expr         min          lq        mean      median          uq         max neval cld
       getJenksBreaks(x, 6, subset = NULL) 0.002824861 0.003038748 0.003270575 0.003145692 0.003464058 0.004263771    10  a 
 classIntervals(x, n = 5, style = "jenks") 2.008109622 2.033353970 2.094278189 2.103680325 2.111840853 2.231148846    10   

这篇关于分成几类:混蛋vs kmeans的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆