从句子中提取“有用"信息? [英] Extracting 'useful' information out of sentences?

查看:30
本文介绍了从句子中提取“有用"信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试理解这种形式的句子:

I am currently trying to understand sentences of this form:

问题更多的是机顶盒而不是电视.重启机顶盒解决问题.

我对自然语言处理完全陌生,并开始使用 Python 的 NLTK 包来弄脏我的手.但是,我想知道是否有人可以向我概述实现这一目标所涉及的高级步骤.

I am totally new to Natural Language Processing and started using Python's NLTK package to get my hands dirty. However, I am wondering if someone could give me an overview of the high-level steps involved in achieving this.

我试图做的是确定在这种情况下是什么问题,机顶盒以及所采取的行动是否解决了问题,因此在这种情况下,是的 因为重新启动修复了问题.所以如果所有的句子都是这种形式,我的生活会更轻松,但因为它是自然语言,句子也可以是以下形式:

What I am trying to do is to identify what the problem was so in this case, set-top box and whether the action that was taken resolved the problem so in this case, yes because restarting fixed the problem. So if all the sentences were of this form, my life would have been easier but because it is natural language, the sentences could also be of the following form:

我看了看这辆车,没有发现它有什么问题.但是,我怀疑引擎有问题

所以在这种情况下,问题出在 car 上.由于suspect 一词的存在,所采取的行动并没有解决问题.潜在的问题可能出在 engine 上.

So in this case, the problem was with the car. The action taken did not resolve the problem because of the presence of the word suspect. And the potential problem could be with the engine.

我不是在寻找绝对的答案,因为我怀疑这非常复杂.我正在寻找的更像是一个高层次的概述,它将为我指明正确的方向.如果有更简单/替代的方法来做到这一点,也欢迎这样做.

I am not looking for an absolute answer as I suspect this is very complex. What I am looking for is more rather a high-level overview that will point me in the right direction. If there is an easier/alternate way to do this, that is welcome as well.

推荐答案

可能,如果句子格式正确,我会尝试使用 依赖解析(http://nltk.googlecode.com/svn/trunk/doc/api/nltk.parse.malt.MaltParser-class.html#raw_parse).这为您提供了一个句子成分的图表,您可以分辨出词项之间的关系.稍后,您可以从依赖项解析器 (http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#code-cfg2) 的输出中提取短语,这可以帮助您提取一个句子,或一个句子中的动词短语.

Probably, if the sentences are well-formed, I would experiment with dependency parsing (http://nltk.googlecode.com/svn/trunk/doc/api/nltk.parse.malt.MaltParser-class.html#raw_parse). That gives you a graph of the constituents of a sentence and you can tell the relations between the lexical items. Later, you can extract phrases from the output of a dependency parser (http://nltk.googlecode.com/svn/trunk/doc/book/ch08.html#code-cfg2) That could help you to extract the direct object of a sentence, or the verb phrase in a sentence.

如果您只想从句子中获取短语或块",您可以尝试块解析器(http://nltk.googlecode.com/svn/trunk/doc/api/nltk.chunk-module.html).也可以进行命名实体识别(http://streamhacker.com/2009/02/23/chunk-extraction-with-nltk/).它通常用于提取地点、组织或人名的实例,但它也适用于您的情况.

If you just want to get phrases or "chunks" from a sentence, you can try chunk parser (http://nltk.googlecode.com/svn/trunk/doc/api/nltk.chunk-module.html). You can also carry out named entity recognition (http://streamhacker.com/2009/02/23/chunk-extraction-with-nltk/). It's usually used to extract instances of places, organizations or people names but it could work in your case as well.

假设您解决了从句子中提取名词/动词短语的问题,您可能需要将它们过滤掉以减轻您的领域专家的工作(太多的短语可能会压垮法官).您可以对您的短语进行频率分析,删除通常与问题域无关的非常频繁的短语,或者编制一个白名单并保留包含一组预定义单词的短语等.

Assuming that you solve the problem of extracting noun/verb phrases from a sentence, you may need to filter them out to ease the job of your domain expert (too many phrases could overwhelm a judge). You may carry out a frequency analysis on your phrases, remove very frequent ones that are not usually related to the problem domain, or compile a white-list and keep the phrases that contain a pre-defined set of words, etc.

这篇关于从句子中提取“有用"信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆