kmeans:快速转移阶段的步数已超过上限 [英] kmeans: Quick-TRANSfer stage steps exceeded maximum

查看:480
本文介绍了kmeans:快速转移阶段的步数已超过上限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用标准stats包:kmeans(dataset, centers = 100, nstart = 25, iter.max = 20)在具有636,688行和7列的数据集上的R中运行k-means聚类.

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20).

我收到以下错误:Quick-TRANSfer stage steps exceeded maximum (= 31834400),尽管可以在

I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400), and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone could clarify once and for all what I can do to mitigate the issue.

推荐答案

我只是遇到了同样的问题.

I just had the same issue.

通过?kmeans参见R中kmeans的文档:

See the documentation of kmeans in R via ?kmeans:

Hartigan-Wong算法 通常比任何一个做得更好,但是尝试 通常建议几次随机启动("nstart"> 1). 罕见 在某些点("x"行)非常接近的情况下, 该算法可能无法在快速转移"阶段收敛, 发出警告(并返回"ifault = 4").轻微 在这种情况下,建议对数据进行四舍五入.

The Hartigan-Wong algorithm generally does a better job than either of those, but trying several random starts (‘nstart’> 1) is often recommended. In rare cases, when some of the points (rows of ‘x’) are extremely close, the algorithm may not converge in the "Quick-Transfer" stage, signalling a warning (and returning ‘ifault = 4’). Slight rounding of the data may be advisable in that case.

在这种情况下,您可能需要切换到Lloyd或MacQueen算法.

In these cases, you may need to switch to the Lloyd or MacQueen algorithms.

这里关于R的讨厌的事情是它继续发出可能未被注意的警告.出于基准测试的目的,我认为这是一次失败的运行,因此我使用:

The nasty thing about R here is that it continues with a warning that may go unnoticed. For my benchmark purposes, I consider this to be a failed run, and thus I use:

if (kms$ifault==4) { stop("Failed in Quick-Transfer"); }

根据您的用例,您可能想要做类似的事情

Depending on your use case, you may want to do something like

if (kms$ifault==4) { kms = kmeans(X, kms$centers, algorithm="MacQueen"); }

相反,继续使用其他算法.

instead, to continue with a different algorithm.

如果要对K均值进行基准测试,请注意R默认情况下使用iter.max=10.收敛可能需要十次以上的迭代.

If you are benchmarking K-means, note that R uses iter.max=10 per default. It may take much more than 10 iterations to converge.

这篇关于kmeans:快速转移阶段的步数已超过上限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆