Hadoop机器学习/数据挖掘项目的想法? [英] Hadoop Machine learning/Data mining project idea?

查看:152
本文介绍了Hadoop机器学习/数据挖掘项目的想法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名研究生CS学生(数据挖掘和机器学习),并具有良好的Java核心(> 4年)的知识。我已经阅读了一堆关于Hadoop和Map / Reduce的内容



现在我想做一个关于这个东西的项目(通过我的空闲时间)来获取一个更好的理解。



任何好的项目想法都会得到真正的赞赏。我只想做这个学习,所以我不介意重新发明轮子。此外,与数据挖掘/机器学习相关的任何事情都将是额外的好处(符合我的研究结果),但绝对没有必要。

解决方案

你没有写任何关于你的兴趣的东西。
我知道图挖掘中的算法是通过hadoop框架实现的。此软件 http://www.cs.cmu.edu/~pegasus/ 和纸张:PEGASUS:一个Peta规模图挖掘系统 - 实现和观察可能会给你一个出发点。进一步,这个链接讨论类似于你的问题: http://atbrox.com/2010/ 02/08 / parallel-machine-learning-for-hadoopmapreduce-a-python-example / 但是它在python中。
并且Andrew Ng撰写了一篇非常好的论文Map-Reduce for Machine Learning on Multicore。

有关类似主题的NIPS 2009研讨会大规模机器学习:并行和大规模数据集。你可以浏览一些论文并得到一个想法。

编辑:另外还有Apache Mahout http:// mahout。 apache.org/ - >我们使用map / reduce范例在Apache Hadoop上实现了用于集群,分类和基于批处理的协作过滤的核心算法


I am a graduate CS student (Data mining and machine learning) and have a good exposure to core Java (>4 years). I have read up a bunch of stuff on Hadoop and Map/Reduce

I would now like to do a project on this stuff (over my free time of corse) to get a better understanding.

Any good project ideas would be really appreciated. I just wanna do this to learn, so I dont really mind re-inventing the wheel. Also, anything related to data mining/machine learning would be an added bonus (fits with my research) but absolutely not necessary.

解决方案

You haven't written anything about your interest. I know algorithms in graph mining has been implemented over hadoop framework. This software http://www.cs.cmu.edu/~pegasus/ and paper : "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations" may give you starting point.

Further, this link discusses something similar to your question: http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/ but it is in python. And, there is a very good paper by Andrew Ng "Map-Reduce for Machine Learning on Multicore".

There was a NIPS 2009 workshop on similar topic "Large-Scale Machine Learning: Parallelism and Massive Datasets". You can browse some of the paper and get an idea.

Edit : Also there is Apache Mahout http://mahout.apache.org/ -->" Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm"

这篇关于Hadoop机器学习/数据挖掘项目的想法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆