分类词和类别值 [英] Categorizing Words and Category Values

查看:257
本文介绍了分类词和类别值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们今天集类算法的问题,作为一个如果你想出一个解决方案,您不必做这个主题。所以当然,大家都以为我们会给它一个去。

We were set an algorithm problem in class today, as a "if you figure out a solution you don't have to do this subject". SO of course, we all thought we will give it a go.

基本上,我们提供了100个单词和10个类别的DB。有任一词语或类别之间不存在匹配。因此,100字,10个类别它基本上是一个列表。

Basically, we were provided a DB of 100 words and 10 categories. There is no match between either the words or the categories. So its basically a list of 100 words, and 10 categories.

我们要到位的话到正确的类别 - 也就是说,我们要弄清楚如何把话说到正确的类别。因此,我们必须懂字,然后把它放在最合适的类别algorthmically。

We have to "place" the words into the correct category - that is, we have to "figure out" how to put the words into the correct category. Thus, we must "understand" the word, and then put it in the most appropriate category algorthmically.

即。一家之言是钓鱼类别运动 - >所以这将进入这一类。有话和这样的分类之间有一些重叠,有些话可以进入多个类别。

i.e. one of the words is "fishing" the category "sport" --> so this would go into this category. There is some overlap between words and categories such that some words could go into more than one category.

如果我们搞清楚,我们必须增加样本量和该人提供的最佳匹配%胜。

If we figure it out, we have to increase the sample size and the person with the "best" matching % wins.

没有人有任何想法如何开始这样的事情?或者有什么资源?在C#preferably?

Does anyone have ANY idea how to start something like this? Or any resources ? Preferably in C#?

即使关键字DB或东西可能会有所帮助?任何人都知道的任何免费的?

Even a keyword DB or something might be helpful ? Anyone know of any free ones?

推荐答案

首先,你需要示例文本进行分析,得到的话的关系。 与潜在语义分析一个分类中的潜在语义分析的方法来分类

First of all you need sample text to analyze, to get the relationship of words. A categorization with latent semantic analysis is described in Latent Semantic Analysis approaches to categorization.

有一个不同的方法是朴素贝叶斯文本分类。需要与指定的类别样本的文本。在学习步执行程序学习的不同类别和一个词出现在指定给一个类别的文本的可能性,请参见贝叶斯垃圾邮件过滤。我不知道如何与单个词的作品。

A different approach would be naive bayes text categorization. Sample text with the assigned category are needed. In a learning step the program learns the different categories and the likelihood that a word occurs in a text assigned to a category, see bayes spam filtering. I don't know how well that works with single words.

这篇关于分类词和类别值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆