我可以使用自动编码器进行聚类吗? [英] Can I use autoencoder for clustering?

查看:41
本文介绍了我可以使用自动编码器进行聚类吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的代码中,他们使用自动编码器作为监督聚类或分类,因为它们有数据标签.

每次运行结果都会改变.您可以(也许)使用堆叠式自动编码器或使用更多隐藏节点获得更好的结果(但随后您无法绘制它们).在这里,我觉得结果受到数据的限制.

顺便说一句,我用这个代码制作了上面的情节:

d <- as.matrix(f[1:30,]) #前30个,避免过度混乱标签 <- as.vector(tfidf[1:30, 1])情节(d,pch = 17)#三角形text(d, labels, pos = 3) #pos=3 表示上面

(P.S. 原始数据来自 Brandon Rose 关于使用 NLTK 的优秀文章.)

In the below code, they use autoencoder as supervised clustering or classification because they have data labels. http://amunategui.github.io/anomaly-detection-h2o/ But, can I use autoencoder to cluster data if I did not have its labels.? Regards

解决方案

The deep-learning autoencoder is always unsupervised learning. The "supervised" part of the article you link to is to evaluate how well it did.

The following example (taken from ch.7 of my book, Practical Machine Learning with H2O, where I try all the H2O unsupervised algorithms on the same data set - please excuse the plug) takes 563 features, and tries to encode them into just two hidden nodes.

m <- h2o.deeplearning(
  2:564, training_frame = tfidf,
  hidden = c(2), auto-encoder = T, activation = "Tanh"
  )
f <- h2o.deepfeatures(m, tfidf, layer = 1)

The second command there extracts the hidden node weights. f is a data frame, with two numeric columns, and one row for every row in the tfidf source data. I chose just two hidden nodes so that I could plot the clusters:

Results will change on each run. You can (maybe) get better results with stacked auto-encoders, or using more hidden nodes (but then you cannot plot them). Here I felt the results were limited by the data.

BTW, I made the above plot with this code:

d <- as.matrix(f[1:30,]) #Just first 30, to avoid over-cluttering
labels <- as.vector(tfidf[1:30, 1])
plot(d, pch = 17) #Triangle
text(d, labels, pos = 3) #pos=3 means above

(P.S. The original data came from Brandon Rose's excellent article on using NLTK. )

这篇关于我可以使用自动编码器进行聚类吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆