算法的产品列表进行分类? [英] Algorithm to classify a list of products?

查看:105
本文介绍了算法的产品列表进行分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这或多或少同一个列表重新presenting产品。例如,在下面的列表,它们都是希捷硬盘。

I have a list representing products which are more or less the same. For instance, in the list below, they are all Seagate hard drives.

  1. 在希捷硬盘500Go
  2. 在希捷硬盘120Go笔记本电脑
  3. 希捷酷鱼7200.12 ST3500418AS 500GB 7200转SATA为3.0Gb / s硬盘驱动器
  4. 新的和晶灵500Go硬盘希捷
  5. 希捷酷鱼7200.12
  6. 在希捷FreeAgent Desk移动500GB外置硬盘银7200个USB2.0零售

有关一个人,硬盘驱动器3和5是相同的。我们可以去稍微进一步假设产品1,3,4和5相同,把在其他类别的产品2和6

For a human being, the hard drives 3 and 5 are the same. We could go a little bit further and suppose that the products 1, 3, 4 and 5 are the same and put in other categories the product 2 and 6.

我们有一个巨大的,我想分类产品清单。 没有任何人有什么是最好的算法做这样的事情的想法。有什么建议?

We have a huge list of products that I would like to classify. Does anybody have an idea of what would be the best algorithm to do such thing. Any suggestions?

我虽然贝叶斯分类器,但我不知道这是否是最好的选择。任何帮助将是AP preciated!

I though of a Bayesian classifier but I am not sure if it is the best choice. Any help would be appreciated!

感谢。

推荐答案

您至少需要两个组件:

首先,你需要的东西,做的功能提取,即是把你的项目,并提取相关信息。例如,新的和光亮并不像相关的500Go硬盘和希捷。一个非常简单的方法将包括一个简单的规则,从每个项目中提取制造商,技术的名称,如USB2.0等模式,如国标,转。

First, you need something that does "feature" extraction, i.e. that takes your items and extracts the relevant information. For example, "new and shinny" is not as relevant as "500Go hard drive" and "seagate". A (very) simple approach would consist of a simple heuristic extracting manufacturers, technology names like "USB2.0" and patterns like "GB", "RPM" from each item.

您然后结束了对每个项目的一组功能。一些机器学习人喜欢把此成特征向量,即它具有用于每个特征的一个条目,被设置为0或1,这取决于特征存在与否。这是你的数据重新presentation。在此载体然后你可以做一个距离比较。

You then end up with a set of features for each item. Some machine learning people like to put this into a "feature vector", i.e. it has one entry for each feature, being set to 0 or 1, depending on whether the feature exists or not. This is your data representation. On this vectors you can then do a distance comparison.

请注意,你可能最终得到了数千项的载体。即使是这样,你就必须聚类结果。

Note that you might end up with a vector of thousands of entries. Even then, you then have to cluster your results.

可能有用的维基百科文章:

Possibly useful Wikipedia articles:

  • Feature Extraction
  • Nearest Neighbour Search

这篇关于算法的产品列表进行分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆