计算语言学使用Hadoop MapReduce的项目理念 [英] Computational Linguistics project idea using Hadoop MapReduce
问题描述
CL中的一个计算密集型问题是推导大语义语料库。基本思想是获取大量文本并从它们的分布中推断出单词(同义词,反义词,下标,上位词等)之间的语义关系,即它们出现或接近的词。
这涉及到大量的数据预处理,然后可能涉及许多最近邻搜索和N x N比较,这些比较非常适合MapReduce式并行。
查看本教程:
http://wordspace.collocations.de/doku.php/course:acl2010:start
I need to do a project on Computational Linguistics course. Is there any interesting "linguistic" problem which is data intensive enough to work on using Hadoop map reduce. Solution or algorithm should try and analyse and provide some insight in "lingustic" domain. however it should be applicable to large datasets so that i can use hadoop for it. I know there is a python natural language processing toolkit for hadoop.
One computation-intensive problem in CL is inferring semantics from large corpora. The basic idea is to take a big collection of text and infer the semantic relationships between words (synonyms, antonyms, hyponyms, hypernyms, etc) from their distributions, i.e. what words they occur with or close to.
This involves a lot of data pre-processing and then can involve many nearest neighbor searches and N x N comparisons, which are well-suited for MapReduce-style parallelization.
Have a look at this tutorial:
http://wordspace.collocations.de/doku.php/course:acl2010:start
这篇关于计算语言学使用Hadoop MapReduce的项目理念的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!