在文本中查找相关单词的算法 [英] Algorithm to find related words in a text

查看：95 发布时间：2020/9/7 19:02:30 artificial-intelligence similarity

本文介绍了在文本中查找相关单词的算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想输入一个词(例如"Apple")并处理一个文本(或更多).我想提出相关的术语，例如:处理Apple的文档并找到iPod，iPhone ，Mac是与"Apple"相关的术语.

I would like to have a word (e.g. "Apple) and process a text (or maybe more). I'd like to come up with related terms. For example: process a document for Apple and find that iPod, iPhone, Mac are terms related to "Apple".

关于如何解决此问题的任何想法?

Any idea on how to solve this?

推荐答案

作为起点:您的问题与文本挖掘.

As a starting point: your question relates to text mining.

有两种方法:一种统计方法，一种是自然语言处理(nlp).

There are two ways: a statistical approach, and one form natural language processing (nlp).

我对nlp不太了解，但是可以谈谈统计方法:

I do not know much about nlp, but can say something about the statistical approach:

您需要一些文档的矢量空间表示形式，请参见 http://en.wikipedia.org/wiki/Vector_space_model http://en.wikipedia.org/wiki/Document-term_matrix http://en.wikipedia.org/wiki/Tf%E2%80%93idf

You need some vector space representation of your documents, see http://en.wikipedia.org/wiki/Vector_space_model http://en.wikipedia.org/wiki/Document-term_matrix http://en.wikipedia.org/wiki/Tf%E2%80%93idf

要学习语义，即:不同的单词表示相同的含义，或者一个单词可以具有不同的含义，则需要一个较大的文本语料库进行学习.正如我所说的，这是一种统计方法，因此您需要大量样本. http://www.daviddlewis.com/resources/testcollections/

In order to learn semantics, that is: different words mean the same, or one word can have different meanings, you need a large text corpus for learning. As I said this is a statistical approach, so you need lots of samples. http://www.daviddlewis.com/resources/testcollections/

也许您要使用的上下文中有很多文档.那是最好的情况.

Maybe you have lots of documents from the context you are going to use. That is the best situation.

您必须从该语料库中检索潜在因素.最常见的是:

You have to retrieve latent factors from this corpus. Most common are:

LSA( http://en.wikipedia.org/wiki/Latent_semantic_analysis )
PLSA( http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis )
非负矩阵分解( http://en.wikipedia.org/wiki/Non-negative_matrix_factorization )
潜在的狄利克雷分配( http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation )

LSA (http://en.wikipedia.org/wiki/Latent_semantic_analysis)
PLSA (http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis)
nonnegative matrix factorization (http://en.wikipedia.org/wiki/Non-negative_matrix_factorization)
latent dirichlet allocation (http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)

这些方法涉及大量数学.您要么挖掘它，要么就必须找到好的库.

These methods involve lots of math. Either you dig it, or you have to find good libraries.

我可以推荐以下书籍:

http://www.oreilly.de/catalog/9780596529321/toc.html
http://www.oreilly.de/catalog/9780596516499/index.html

这篇关于在文本中查找相关单词的算法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在文本中查找相关单词的算法 [英] Algorithm to find related words in a text

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在文本中查找相关单词的算法 [英] Algorithm to find related words in a text

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭