对于对自然语言处理感兴趣的人,什么是很好的起点? [英] What are good starting points for someone interested in natural language processing?

查看:115
本文介绍了对于对自然语言处理感兴趣的人,什么是很好的起点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我最近提出了一些新的可能项目,这些项目必须处理从用户提交和生成的文本中得出含义"的问题.

So I've recently came up with some new possible projects that would have to deal with deriving 'meaning' from text submitted and generated by users.

自然语言处理是处理此类问题的领域,初步研究发现,我发现了 OpenNLP Hub 和大学合作,如.

Natural language processing is the field that deals with these kinds of issues, and after some initial research I found the OpenNLP Hub and university collaborations like the attempto project. And stackoverflow has this.

如果有人可以将我链接到一些不错的资源,从研究论文和介绍性文本再到api,那么我比一个打开圣诞礼物的6岁孩子更快乐!

If anyone could link me to some good resources, from reseach papers and introductionary texts to apis, I'd be happier than a 6 year-old kid opening his christmas presents!

通过您的一项建议,我发现 opencyc (世界上最大,最完整的通用知识库和常识推理引擎".更令人惊奇的是,有一个项目是opencyc的简化版本,名为 UMBEL .它以rdf/owl/skos n3语法提供语义数据.

Through one of your recommendations I've found opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). Even more amazing still, there's a project that is a distilled version of opencyc called UMBEL. It features semantic data in rdf/owl/skos n3 syntax.

我还偶然发现 antlr ,它是用于构造识别器,解释器,编译器的解析器生成器,以及语法描述中的翻译.

I've also stumbled upon antlr, a parser generator for 'constructing recognizers, interpreters, compilers, and translators from grammatical descriptions'.

我在这里有一个问题,其中列出了大量免费和开放的数据.

And there's a question on here by me, that lists tons of free and open data.

感谢stackoverflow社区!

Thanks stackoverflow community!

推荐答案

强硬的电话,NLP的领域比大多数人认为的要广泛得多.基本上,语言可以分为几类,这将要求您学习完全不同的东西.

Tough call, NLP is a much wider field than most people think it is. Basically, language can be split up into several categories, which will require you to learn totally different things.

在我开始之前,请允许我告诉您,我是否会在没有某个(密切相关的)领域的学位的情况下取得任何显著的成功(至少是一名专业人士).涉及到很多理论,其中大部分是枯燥的东西,很难学习.您将需要很多耐力,而最重要的是:时间.

Before I start, let me tell you that I doubt you'll have any notable success (as a professional, at least) without having a degree in some (closely related) field. There is a lot of theory involved, most of it is dry stuff and hard to learn. You'll need a lot of endurance and most of all: time.

如果您对文本的含义感兴趣,那就是下一件大事.语义搜索引擎预计将启动Web 3.0,但是我们还远远没有存在".从文本中提取逻辑取决于几个步骤:

If you're interested in the meaning of text, well, that's the Next Big Thing. Semantic search engines are predicted as initiating Web 3.0, but we're far from 'there' yet. Extracting logic from a text is dependant on several steps:

  • 令牌化,成块
  • 词汇层次上的歧义消除(时间像箭一样飞,但是水果像香蕉一样飞.)
  • 语法分析
  • 形态分析(时态,方面,案例,数字,诸如此类)

一个小清单,不在我的头上.还有更多的:-)以及每个点的更多细节.例如,当我说解析"时,这是什么? 很多解析算法不同,并且解析形式主义也很多.其中最强大的是树状语法

A small list, off the top of my head. There's more :-), and many more details to each point. For example, when I say "parsing", what is this? There are many different parsing algorithms, and there are just as many parsing formalisms. Among the most powerful are Tree-adjoining grammar and Head-driven phrase structure grammar. But both of them are hardly used in the field (for now). Usually, you'll be dealing with some half-baked generative approach, and will have to conduct morphological analysis yourself.

从那里转到语义是一大步.语法/语义接口既依赖于所采用的语法语义框架,又没有单一的可行解决方案.在语义方面,有经典的生成语义,然后是话语表示理论动态语义,等等.甚至所有事物所基于的逻辑形式主义仍然没有被很好地定义.有人说应该使用一阶逻辑,但这似乎还不够.然后是Montague所使用的内涵逻辑,但这似乎过于复杂,并且在计算上是不可行的.还有动态逻辑(Groenendijk和Stokhof率先开发了这种东西.很棒的东西!),最近,实际上是今年夏天,

Going from there to semantics is a big step. A Syntax/Semantics interface is dependant both, on the syntactic and semantic framework employed, and there is no single working solution yet. On the semantic side, there's classic generative semantics, then there is Discourse Representation Theory, dynamic semantics, and many more. Even the logical formalism everything is based on is still not well-defined. Some say one should use first-order logic, but that hardly seems sufficient; then there is intensional logic, as used by Montague, but that seems overly complex, and computationally unfeasible. There also is dynamic logic (Groenendijk and Stokhof have pioneered this stuff. Great stuff!) and very recently, this summer actually, Jeroen Groenendijk presented a new formalism, Inquisitive Semantics, also very interesting.

如果您想从一个非常简单的级别开始,请阅读 Blackburn和Bos( 2005年),它是很棒的东西,也是对计算语义学的事实上的介绍!正如Groenendijk和Stokhof(1982)所提出的,我最近扩展了他们的系统,以涵盖问题的分区理论(问题回答是野兽!),但不幸的是,该理论在个人领域具有O(n²)的复杂性. .这样做的时候,我发现B& B的实现在某些地方有点,头.尽管如此,它确实会真正帮助您深入了解计算语义,并且它仍然是可以完成的工作的非常令人印象深刻的展示.此外,他们还应获得额外的冷静点,以实施在《低俗小说》(电影)中建立的语法.

If you want to get started on a very simple level, read Blackburn and Bos (2005), it's great stuff, and the de-facto introduction to Computational Semantics! I recently extended their system to cover the partition-theory of questions (question answering is a beast!), as proposed by Groenendijk and Stokhof (1982), but unfortunately, the theory has a complexity of O(n²) over the domain of individuals. While doing so, I found B&B's implementation to be a bit, erhm… hackish, at places. Still, it is going to really, really help you dive into computational semantics, and it is still a very impressive showcase of what can be done. Also, they deserve extra cool-points for implementing a grammar that is settled in Pulp Fiction (the movie).

当我在这里的时候,拿起Prolog.计算语义方面的许多研究都基于Prolog. 立即学习Prolog!是一个很好的介绍.我还可以推荐"Prolog的艺术"和Covington的"Prolog深度编程"和"Prolog程序员的自然语言处理",前者可在线免费获得.

And while I'm at it, pick up Prolog. A lot of research in computational semantics is based on Prolog. Learn Prolog Now! is a good intro. I can also recommend "The Art of Prolog" and Covington's "Prolog Programming in Depth" and "Natural Language Processing for Prolog Programmers", the former of which is available for free online.

这篇关于对于对自然语言处理感兴趣的人,什么是很好的起点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆