计算语言学使用Hadoop MapReduce的项目理念 [英] Computational Linguistics project idea using Hadoop MapReduce

查看:86
本文介绍了计算语言学使用Hadoop MapReduce的项目理念的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要做一个关于计算语言学课程的项目。是否有任何有趣的语言学问题需要足够的数据来处理使用Hadoop map reduce。解决方案或算法应该尝试分析并提供lingustic域的一些见解。但它应该适用于大数据集,以便我可以使用hadoop。我知道有一个用于hadoop的python自然语言处理工具包。

解决方案

CL中的一个计算密集型问题是推导大语义语料库。基本思想是获取大量文本并从它们的分布中推断出单词(同义词,反义词,下标,上位词等)之间的语义关系,即它们出现或接近的词。



这涉及到大量的数据预处理,然后可能涉及许多最近邻搜索和N x N比较,这些比较非常适合MapReduce式并行。



查看本教程:



http://wordspace.collocations.de/doku.php/course:acl2010:start


I need to do a project on Computational Linguistics course. Is there any interesting "linguistic" problem which is data intensive enough to work on using Hadoop map reduce. Solution or algorithm should try and analyse and provide some insight in "lingustic" domain. however it should be applicable to large datasets so that i can use hadoop for it. I know there is a python natural language processing toolkit for hadoop.

解决方案

One computation-intensive problem in CL is inferring semantics from large corpora. The basic idea is to take a big collection of text and infer the semantic relationships between words (synonyms, antonyms, hyponyms, hypernyms, etc) from their distributions, i.e. what words they occur with or close to.

This involves a lot of data pre-processing and then can involve many nearest neighbor searches and N x N comparisons, which are well-suited for MapReduce-style parallelization.

Have a look at this tutorial:

http://wordspace.collocations.de/doku.php/course:acl2010:start

这篇关于计算语言学使用Hadoop MapReduce的项目理念的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆