算法的产品列表进行分类?取2 [英] Algorithm to classify a list of products? Take 2

查看:97
本文介绍了算法的产品列表进行分类?取2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我问了一个问题类似以这一几个星期前,但我没有问正确的问题。因此,我再次询问这里更多的细节问题,我想获得一个更注重的AI答案。

I asked a question similar to this one a couple of weeks ago, but I did not ask the question correctly. So I am re-asking here the question with more details and I would like to get a more AI oriented answer.

我有这或多或少同一个列表重新presenting产品。例如,在下面的列表,它们都是希捷硬盘。

I have a list representing products which are more or less the same. For instance, in the list below, they are all Seagate hard drives.

  1. 在希捷硬盘500Go
  2. 在希捷硬盘120Go笔记本电脑
  3. 希捷酷鱼7200.12 ST3500418AS 500GB 7200转SATA为3.0Gb / s硬盘驱动器
  4. 新的和晶灵500Go硬盘希捷
  5. 希捷酷鱼7200.12
  6. 在希捷FreeAgent Desk移动500GB外置硬盘银7200个USB2.0零售
  7. GE Spacemaker Laudry
  8. 马自达3 2010
  9. 在2009年的Mazda3 2.3L

有关一个人,硬盘驱动器3和5是相同的。我们可以去稍微进一步假设产品1,3,4和5相同,把在其他类别的产品2和6

For a human being, the hard drives 3 and 5 are the same. We could go a little bit further and suppose that the products 1, 3, 4 and 5 are the same and put in other categories the product 2 and 6.

在我的previous问题,有人建议我使用特征提取。它工作得很好,当我们有一个小的数据集$ pdefined p $说明(所有硬盘驱动器)的,但对于所有其他类型的描述?我不想开始编写基于正则表达式的特征提取为我所有的应用程序可能会面临,它不能扩展的描述。 有没有机器学习算法,可以帮助我实现这一目标?的描述,我能得到的范围很广,1号线,也可能是一台冰箱,并在下一行,硬盘驱动器。我应该尝试走神经网络路径?应该是什么我输入?

In my previous question, someone suggested to me to use feature extraction. It works very well when we have a small dataset of predefined descriptions (all hard drives), but what about all the other kind of description? I don't want to start to write regex based feature extractors for all the descriptions my application could face, it doesn't scale. Is there any machine learning algorithm that could help me to achieve this? The range of description that I can get is very wide, on line 1, it could be a fridge, and then on the next line, a hard drive. Should I try to take the Neural Network path? What should be my inputs?

感谢您的帮助!

推荐答案

我会看一些贝叶斯分类方法。这将涉及训练分类器来识别特定的单词为表明产品属于你的类之一的概率。例如,在被训练,也可能认识到,如果一个产品的描述有希捷在里面,有一个99%的机会,这是一个硬盘驱动器,而如果有马自达有97%的机会,它是一辆汽车。像新一词可能会最终没有造成太多的任何分类,这是你想要的工作方式。

I would look at some Bayesian classification methods. It would involve training the classifier to recognize particular words as indicating probability that a product belongs to one of your classes. For example, after being trained, it could recognize that if a product description has "Seagate" in it, there's a 99% chance that it's a hard drive, whereas if it has "Mazda" there's a 97% chance it's a car. A word like "new" probably would end up not contributing much to any classification, which is the way you want it to work.

这样做的缺点是,它通常需要训练数据的相当大的语料库之后才开始工作得很好,但你可以这样继续,如果你发现它来修改它的比例,而在生产中被(其设置分类错误的东西),它最终将成为非常有效的。

The downside to this would be that it typically requires fairly large corpora of training data before it starts to work well, but you can set it up so that it continues to modify its percentages while being in production (if you notice that it classified something incorrectly), and it will eventually become very effective.

贝叶斯技术的使用相当严重最近的垃圾邮件过滤应用的,所以它可能是好做一些阅读它一直使用的方式。

Bayesian techniques are used quite heavily recently for spam-filtering applications, so it might be good to do some reading on ways it's been used there.

这篇关于算法的产品列表进行分类?取2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆