分成几类:混蛋vs kmeans [英] Partition into classes: jenks vs kmeans
问题描述
我想将向量(长度约为10 ^ 5)划分为五个类.使用包classInt
中的函数classIntervals
时,我想使用style = "jenks"
自然中断,但是即使对于较小的向量(仅500),这也会花费大量时间.设置style = "kmeans"
几乎立即执行.
I want to partition a vector (length around 10^5) into five classes. With the function classIntervals
from package classInt
I wanted to use style = "jenks"
natural breaks but this takes an inordinate amount of time even for a much smaller vector of only 500. Setting style = "kmeans"
executes almost instantaneously.
library(classInt)
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)
system.time(classIntervals(x, n = 5, style = "jenks"))
R> system.time(classIntervals(x, n = 5, style = "jenks"))
user system elapsed
13.46 0.00 13.45
system.time(classIntervals(x, n = 5, style = "kmeans"))
R> system.time(classIntervals(x, n = 5, style = "kmeans"))
user system elapsed
0.02 0.00 0.02
是什么让Jenks算法如此缓慢,并且有更快的方法来运行它?
What makes the Jenks algorithm so slow, and is there a faster way to run it?
如果需要,我将把问题的最后两部分移到stats.stackexchange.com:
If need be I will move the last two parts of the question to stats.stackexchange.com:
- 在什么情况下kmeans是Jenks的合理替代品?
- 通过在随机的1%数据点子集上运行classInt来定义类是否合理?
推荐答案
要回答您的原始问题:
是什么让Jenks算法如此缓慢,并且有更快的方法 运行它吗?
What makes the Jenks algorithm so slow, and is there a faster way to run it?
实际上,与此同时,还有一种更快的方法来应用Jenks算法,即BAMMtools
软件包中的setjenksBreaks
函数.
Indeed, meanwhile there is a faster way to apply the Jenks algorithm, the setjenksBreaks
function in the BAMMtools
package.
但是,请注意,必须将中断次数设置为不同,即,如果在classInt
包的classIntervals
函数中将中断次数设置为5,则必须将中断次数设置为6,而
However, be aware that you have to set the number of breaks differently, i.e. if you set the breaks to 5 in the the classIntervals
function of the classInt
package you have to set the breaks to 6 the setjenksBreaks
function in the BAMMtools
package to get the same results.
# Install and load library
install.packages("BAMMtools")
library(BAMMtools)
# Set up example data
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)
# Apply function
getJenksBreaks(x, 6)
速度极大,即
> microbenchmark( getJenksBreaks(x, 6, subset = NULL), classIntervals(x, n = 5, style = "jenks"), unit="s", times=10)
Unit: seconds
expr min lq mean median uq max neval cld
getJenksBreaks(x, 6, subset = NULL) 0.002824861 0.003038748 0.003270575 0.003145692 0.003464058 0.004263771 10 a
classIntervals(x, n = 5, style = "jenks") 2.008109622 2.033353970 2.094278189 2.103680325 2.111840853 2.231148846 10
这篇关于分成几类:混蛋vs kmeans的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!