我可以使用自动编码器进行集群吗? [英] Can I use autoencoder for clustering?

查看:61
本文介绍了我可以使用自动编码器进行集群吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的代码中,由于它们具有数据标签,因此它们将自动编码器用作监督的聚类或分类. http://amunategui.github.io/anomaly-detection-h2o/ 但是,如果没有标签,可以使用自动编码器对数据进行聚类吗? 问候

In the below code, they use autoencoder as supervised clustering or classification because they have data labels. http://amunategui.github.io/anomaly-detection-h2o/ But, can I use autoencoder to cluster data if I did not have its labels.? Regards

推荐答案

深度学习自动编码器始终是无监督学习.您链接到本文的受监督"部分是为了评估其效果.

The deep-learning autoencoder is always unsupervised learning. The "supervised" part of the article you link to is to evaluate how well it did.

以下示例(摘自本书第7章,《使用H2O进行实用机器学习》,在该示例中,我尝试对同一数据集使用所有H2O非监督算法-请原谅该插头)具有563个功能,并尝试对其进行编码进入两个隐藏节点.<​​/p>

The following example (taken from ch.7 of my book, Practical Machine Learning with H2O, where I try all the H2O unsupervised algorithms on the same data set - please excuse the plug) takes 563 features, and tries to encode them into just two hidden nodes.

m <- h2o.deeplearning(
  2:564, training_frame = tfidf,
  hidden = c(2), auto-encoder = T, activation = "Tanh"
  )
f <- h2o.deepfeatures(m, tfidf, layer = 1)

那里的第二条命令提取隐藏的节点权重. f是一个数据帧,具有两个数字列,并且tfidf源数据中的每一行一行.我只选择了两个隐藏节点,以便可以绘制群集:

The second command there extracts the hidden node weights. f is a data frame, with two numeric columns, and one row for every row in the tfidf source data. I chose just two hidden nodes so that I could plot the clusters:

每次运行的结果都会改变.使用堆叠式自动编码器或使用更多隐藏节点可以(也许)获得更好的结果(但是您无法绘制它们).在这里,我感到结果受到数据的限制.

Results will change on each run. You can (maybe) get better results with stacked auto-encoders, or using more hidden nodes (but then you cannot plot them). Here I felt the results were limited by the data.

顺便说一句,我用下面的代码绘制了上面的情节:

BTW, I made the above plot with this code:

d <- as.matrix(f[1:30,]) #Just first 30, to avoid over-cluttering
labels <- as.vector(tfidf[1:30, 1])
plot(d, pch = 17) #Triangle
text(d, labels, pos = 3) #pos=3 means above

(P.S.原始数据来自 Brandon Rose关于使用NLTK的精彩文章.)

(P.S. The original data came from Brandon Rose's excellent article on using NLTK. )

这篇关于我可以使用自动编码器进行集群吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆