占主导地位的话，整个文本短语，和收集的文字：对于文本分析，具体算法或库 [英] Algorithms or libraries for textual analysis, specifically: dominant words, phrases across text, and collection of text

查看：171 发布时间：2015/11/30 14:41:40 algorithm text nlp analysis lexical-analysis

本文介绍了占主导地位的话，整个文本短语，和收集的文字：对于文本分析，具体算法或库的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我工作的一个项目，我需要分析的文本和文本的网页集合的网页，以确定主导词。我想知道是否有一个图书馆（preFER C＃或Java），将处理繁重的我。如果没有，是否有一个算法或多个，将实现我的目标如下。

I'm working on a project where I need to analyze a page of text and collections of pages of text to determine dominant words. I'd like to know if there is a library (prefer c# or java) that will handle the heavy lifting for me. If not, is there an algorithm or multiple that would achieve my goals below.

我想要做的是类似于从一个网址或RSS提要，你在网络上找到，但我不希望可视化建字云。它们被用于所有的时间用于分析presidential候选人的演讲，看看有什么主题或最常用的词。

What I want to do is similar to word clouds built from a url or rss feed that you find on the web, except I don't want the visualization. They are used all the time for analyzing the presidential candidate speeches to see what the theme or most used words are.

并发症，是我需要做的数以千计的短文件，然后集合或这些文件的类别。

The complication, is that I need to do this on thousands of short documents, and then collections or categories of these documents.

我最初的计划是在解析文档出来，然后过滤常用词 - 后，他，她，等再算上次剩余的词出现在文本（和整体的集合/类）的数量。

My initial plan was to parse the document out, then filter common words - of, the, he, she, etc.. Then count the number of times the remaining words show up in the text (and overall collection/category).

现在的问题是，在未来，我想处理而产生，复数形式，等等。我也想看看是否有一种方法来找出重要词组。（而不是一个字的计数的，一个短语被2-3词语一起计）

The problem is that in the future, I would like to handle stemming, plural forms, etc.. I would also like to see if there is a way to identify important phrases. (Instead of a count of a word, the count of a phrase being 2-3 words together)

这是一个策略，库或算法，这将有助于任何指导的AP preciated。

Any guidance on a strategy, libraries or algorithms that would help are appreciated.

占主导地位的话，整个文本短语，和收集的文字：对于文本分析，具体算法或库 [英] Algorithms or libraries for textual analysis, specifically: dominant words, phrases across text, and collection of text

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

占主导地位的话，整个文本短语，和收集的文字：对于文本分析，具体算法或库 [英] Algorithms or libraries for textual analysis, specifically: dominant words, phrases across text, and collection of text

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭