信息检索(IR)vs数据挖掘vs机器学习(ML) [英] Information retrieval (IR) vs data mining vs Machine Learning (ML)

查看:230
本文介绍了信息检索(IR)vs数据挖掘vs机器学习(ML)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

人们经常把IR,ML和数据挖掘这两个术语混为一谈,但是我注意到它们之间有很多重叠之处.

People often throw around the terms IR, ML, and data mining, but I have noticed a lot of overlap between them.

从在这些领域有经验的人那里,究竟能划出什么界限?

From people with experience in these fields, what exactly draws the line between these?

推荐答案

这只是一个人(经过ML正式培训)的观点;其他人可能会看到完全不同的情况.

This is just the view of one person (formally trained in ML); others might see things quite differently.

机器学习可能是这三个术语中最均匀的,也是应用最一致的-仅限于模式提取 (或模式匹配)算法本身.

Machine Learning is probably the most homogeneous of these three terms, and the most consistently applied--it's limited to the pattern-extraction (or pattern-matching) algorithms themselves.

在您提到的术语中,机器学习"是学术部门最常用来描述其课程,学术部门和研究计划的术语,也是学术期刊和会议论文集中最常用的术语. ML显然是您提到的术语中与上下文关系最少的.

Of the terms you mentioned, "Machine Learning" is the one most used by Academic Departments to describe their Curricula, their academic departments, and their research programs, as well as the term most used in academic journals and conferences proceedings. ML is clearly the least context-dependent of the terms you mentioned.

信息检索数据挖掘更接近于描述完整的商业流程,即从用户查询到相关内容的检索/交付结果.机器学习算法可能可能在该流程中某个位置,在更复杂的应用程序中通常是这样,但这不是正式的要求.此外,术语数据挖掘通常似乎是指在大数据(即> 2BG)上应用某些处理流程,因此通常包括分布式处理(map-缩小)组件.

Information Retrieval and Data Mining are much closer to describing complete commercial processes--i.e., from user query to retrieval/delivery of relevant results. ML algorithms might be somewhere in that process flow, and in the more sophisticated applications, often are, but that's not a formal requirement. In addition, the term Data Mining seems usually to refer to application of some process flow on big data (i.e, > 2BG) and therefore usually includes a distributed processing (map-reduce) component near the front of that workflow.

因此,信息检索(IR)和数据挖掘(DM)以 Infrastructure-Algorithm 的一种方式与机器学习(ML)相关.换句话说,机器学习是用于解决信息检索问题的工具之一.但这只是工具的一种来源.但是IR并不依赖于ML,例如,特定的IR项目可能是响应用户的搜索查询IR来存储和快速检索全索引数据,其症结在于优化数据流的性能,即,即从查询到将搜索结果交付给用户的往返行程.预测或模式匹配在这里可能没有用.同样,DM项目可能将ML算法用于预测引擎,但DM项目也更可能与整个处理流程有关-例如,用于有效输入大量数据(可能是TB的并行计算技术) ),它将原始结果提供给处理引擎,以计算有关变量(列)的描述性统计信息(均值,标准差,分布等).

So Information Retrieval (IR) and Data Mining (DM) are related to Machine Learning (ML) in an Infrastructure-Algorithm kind of way. In other words, Machine Learning is one source of tools used to solve problems in Information Retrieval. But it's only one source of tools. But IR doesn't depend on ML--for instance, a particular IR project might be storage and rapid retrieval of the fully-indexed data responsive to a user's search query IR, the crux of which is optimizing performance of the data flow, i.e., the round-trip from query to delivering the search results to the user. Prediction or pattern matching might not be useful here. Likewise, a DM project might use an ML algorithm for the predictive engine, yet a DM project is more likely to also be concerned with the entire processing flow--for instance, parallel computation techniques for efficient input of an enormous data volume (TB perhaps) which delivers a proto-result to a processing engine for computation of descriptive statistics (mean, standard deviation, distribution, etc. on the variables (columns).

最后考虑Netflix奖.该竞赛仅针对机器学习-重点在于预测算法,事实证明存在一个成功标准:该算法返回的预测的准确性.想象一下,是否将"Netflix奖"更名为数据挖掘竞赛.成功的标准几乎可以肯定会得到扩展,以便在实际的商业环境中更准确地访问算法的性能,因此,例如,可能会考虑整体执行速度(向用户交付建议的速度)以及准确性.

Lastly consider the Netflix Prize. This competition was directed solely to Machine Learning--the focus was on the prediction algorithm, as evidenced by the fact that there was a single success criterion: accuracy of the predictions returned by the algorithm. Imagine if the 'Netflix Prize' were rebranded as a Data Mining competition. The success criteria would almost certainly be expanded to more accurately access the algorithm's performance in the actual commercial setting--so for instance overall execution speed (how quickly are the recommendations delivered to the user) would probably be considered along with accuracy.

信息检索"和数据挖掘"这两个术语现在已成为主流,尽管有一段时间我只在工作说明或供应商文献中看到这些术语(通常在解决方案"一词旁边).雇主,我们最近雇用了数据挖掘"分析师.我不知道他到底在做什么,但是他每天都系着领带上班.

The terms "Information Retrieval" and "Data Mining" are now in mainstream use, though for a while I only saw these terms in my job description or in vendor literature (usually next to the word "solution.") At my employer, we recently hired a "Data Mining" analyst. I don't know what he does exactly, but he wears a tie to work every day.

这篇关于信息检索(IR)vs数据挖掘vs机器学习(ML)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆