使用主题建模Java工具箱 [英] Using topic modeling Java toolkit

查看：83 发布时间：2020/6/30 18:31:55 topic-modeling mallet lingpipe

本文介绍了使用主题建模Java工具箱的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究文本分类，我想使用主题模型(LDA). 我的语料库至少包含24，000个波斯新闻文件.语料库中的每个文档都采用从新闻中提取的(关键字，权重)对的格式.

I'm working on text classification and I want to use Topic models (LDA). My corpus consists of at least 24, 000 Persian news documents. each doc in the corpus is in format of (keyword, weight) pairs extracted from the news.

我看到了两个Java工具包:槌和lingpipe. 我已经阅读了有关导入数据的槌槌教程，该教程以纯文本格式而不是我所拥有的格式获取数据.有什么办法可以改变吗?

I saw two Java toolkits: mallet and lingpipe. I've read mallet tutorial on importing the data and it gets data in plain text, not the format that I have. is there any way that I could change it?

还阅读了一些有关lingpipe的内容，本教程中的示例使用整数数组.大数据方便吗?

Also read a little about the lingpipe, the example from tutorial was using arrays of integers. Is it convenient for large data?

我需要知道哪种LDA实现对我来说更好?还有其他适合我的数据的实现吗? (在Java中)

I need to know which implementation of LDA is better for me? Are there any other implementation that suits my data? (in Java)

使用主题建模Java工具箱 [英] Using topic modeling Java toolkit

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用主题建模Java工具箱 [英] Using topic modeling Java toolkit

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭