我可以使用自动编码器进行集群吗? [英] Can I use autoencoder for clustering?
问题描述
在下面的代码中,由于它们具有数据标签,因此它们将自动编码器用作监督的聚类或分类. http://amunategui.github.io/anomaly-detection-h2o/ 但是,如果没有标签,可以使用自动编码器对数据进行聚类吗? 问候
In the below code, they use autoencoder as supervised clustering or classification because they have data labels. http://amunategui.github.io/anomaly-detection-h2o/ But, can I use autoencoder to cluster data if I did not have its labels.? Regards
推荐答案
深度学习自动编码器始终是无监督学习.您链接到本文的受监督"部分是为了评估其效果.
The deep-learning autoencoder is always unsupervised learning. The "supervised" part of the article you link to is to evaluate how well it did.
以下示例(摘自本书第7章,《使用H2O进行实用机器学习》,在该示例中,我尝试对同一数据集使用所有H2O非监督算法-请原谅该插头)具有563个功能,并尝试对其进行编码进入两个隐藏节点.</p>
The following example (taken from ch.7 of my book, Practical Machine Learning with H2O, where I try all the H2O unsupervised algorithms on the same data set - please excuse the plug) takes 563 features, and tries to encode them into just two hidden nodes.
m <- h2o.deeplearning(
2:564, training_frame = tfidf,
hidden = c(2), auto-encoder = T, activation = "Tanh"
)
f <- h2o.deepfeatures(m, tfidf, layer = 1)
那里的第二条命令提取隐藏的节点权重. f
是一个数据帧,具有两个数字列,并且tfidf
源数据中的每一行一行.我只选择了两个隐藏节点,以便可以绘制群集:
The second command there extracts the hidden node weights. f
is a data frame, with two numeric columns, and one row for every row in the tfidf
source data. I chose just two hidden nodes so that I could plot the clusters:
每次运行的结果都会改变.使用堆叠式自动编码器或使用更多隐藏节点可以(也许)获得更好的结果(但是您无法绘制它们).在这里,我感到结果受到数据的限制.
Results will change on each run. You can (maybe) get better results with stacked auto-encoders, or using more hidden nodes (but then you cannot plot them). Here I felt the results were limited by the data.
顺便说一句,我用下面的代码绘制了上面的情节:
BTW, I made the above plot with this code:
d <- as.matrix(f[1:30,]) #Just first 30, to avoid over-cluttering
labels <- as.vector(tfidf[1:30, 1])
plot(d, pch = 17) #Triangle
text(d, labels, pos = 3) #pos=3 means above
(P.S.原始数据来自 Brandon Rose关于使用NLTK的精彩文章.)
(P.S. The original data came from Brandon Rose's excellent article on using NLTK. )
这篇关于我可以使用自动编码器进行集群吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!