使用 Hadoop MapReduce 的计算语言学项目理念 [英] Computational Linguistics project idea using Hadoop MapReduce

查看:19
本文介绍了使用 Hadoop MapReduce 的计算语言学项目理念的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要做一个关于计算语言学课程的项目.是否有任何有趣的语言"问题,其数据密集程度足以使用 Hadoop map reduce.解决方案或算法应尝试分析并提供语言"领域的一些见解.但是它应该适用于大型数据集,以便我可以使用 hadoop.我知道hadoop有一个python自然语言处理工具包.

I need to do a project on Computational Linguistics course. Is there any interesting "linguistic" problem which is data intensive enough to work on using Hadoop map reduce. Solution or algorithm should try and analyse and provide some insight in "lingustic" domain. however it should be applicable to large datasets so that i can use hadoop for it. I know there is a python natural language processing toolkit for hadoop.

推荐答案

CL 中的一个计算密集型问题是从大型语料库中推断语义.基本思想是收集大量文本,并从它们的分布中推断出单词(同义词、反义词、下位词、上位词等)之间的语义关系,即它们出现或接近的词.

One computation-intensive problem in CL is inferring semantics from large corpora. The basic idea is to take a big collection of text and infer the semantic relationships between words (synonyms, antonyms, hyponyms, hypernyms, etc) from their distributions, i.e. what words they occur with or close to.

这涉及大量数据预处理,然后可能涉及许多最近邻搜索和 N x N 比较,非常适合 MapReduce 风格的并行化.

This involves a lot of data pre-processing and then can involve many nearest neighbor searches and N x N comparisons, which are well-suited for MapReduce-style parallelization.

看看这个教程:

http://wordspace.collocations.de/doku.php/course:acl2010:开始

这篇关于使用 Hadoop MapReduce 的计算语言学项目理念的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆