如何使用gengensim中的潜在狄利克雷分配(LDA)来抽象两字主题而不是字母组合? [英] How to abstract bigram topics instead of unigrams using Latent Dirichlet Allocation (LDA) in python- gensim?

查看：54 发布时间：2021/5/10 19:05:58 nlp text-mining lda gensim

本文介绍了如何使用gengensim中的潜在狄利克雷分配(LDA)来抽象两字主题而不是字母组合?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

主题2-二氧化物，植物，绿色，碳

topic2 -dioxide,plants,green,carbon

topic2-绿色植物，二氧化碳

topic2 -green plants,carbon dioxide

有什么主意吗?

您可以使用word2vec从使用LDA提取的前n个主题中获取最相似的术语.

You can use word2vec to get most similar terms from the top n topics abstracted using LDA.

LDA输出

使用抽象主题(例如:-san_francisco)创建二元语法字典

Create a dictionary of bi-grams using topics abstracted (for ex:-san_francisco)

然后，执行word2vec以获得最相似的词(单字，双字等)

Then, do word2vec to get most similar words (uni-grams,bi-grams etc)

单词和余弦距离

洛杉矶天使(0.666175)
golden_gate(0.571522)
奥克兰(0.557521)

los_angeles (0.666175)
golden_gate (0.571522)
oakland (0.557521)

这篇关于如何使用gengensim中的潜在狄利克雷分配(LDA)来抽象两字主题而不是字母组合?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文