从文本中提取关系 [英] extracting relations from text

查看:128
本文介绍了从文本中提取关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以(SUBJECT,OBJECT,ACTION)关系的形式从非结构化文本中提取关系,

I want to extract relations from unstructured text in the form of (SUBJECT,OBJECT,ACTION) relations,

例如,

男孩正坐在桌子上吃鸡肉"

"The boy is sitting on the table eating the chicken"

会给我,

(男孩,鸡,吃)
(男孩,桌子,位置)

would give me,

(boy,chicken,eat)
(boy,table,LOCATION)

等.

尽管python程序+ NLTK可以处理上述简单的句子.

although a python program + NLTK could process such a simple sentence as above.

我想知道你们中的任何人是否使用过工具或库,最好是开源的,以从更广泛的领域(例如大量的文本文档或网络)中提取关系.

I'd like to know if any of you have used tools or libraries preferably opensource to extract relations from a much wider domain such as a large collection of text documents or the web.

推荐答案

如果您的句子没有比您所展示的示例复杂得多(例如,针对回指法),则演示.例如,它将给出类似

If your sentences do not get much more complicated than the example you have shown (for instance, with respect to anaphoras), the Stanford parser will give good results, based on a probabilistic context-free grammar, that you will easily be able to convert into the format you want. There is a demo available online. For your example, it will give something like

nsubj(坐着,男孩)

nsubj(sitting, boy)

prep_on(坐在桌子上)

prep_on(sitting, table)

如果您的句子确实变得更复杂,则可能有兴趣尝试 Boxer ,它从C& C构建了话语表示结构基于概率组合类别语法进行解析.这些结构可能更难以适应格式您想要的,但会给您更大的灵活性.再次在线提供演示.对于您的示例,它看起来像

If your sentences do get more complicated, you might be interested in trying Boxer, which builds discourse representation structures from C&C parses, based on probabilistic combinatory categorial grammars. Those structures may prove more difficult to adapt to the format you want, but will allow you much more flexibility. There is, again, a demo available online. For your example, it will look something like

坐(x)

男孩(y)

表(z)

agent(x,y)

agent(x,y)

on(x,z)

Stanford解析器是用Java编写的,可以在GPL下使用. C& C用S ++ Prolog中的C ++和Boxer编写.这两类软件不是根据真正的免费许可证发布的,但是您可以获取源代码,对其进行修改并将其用于任何非商业项目.

The Stanford parser is written in Java and is available under the GPL. C&C is written in C++ and Boxer in SWI Prolog. Those two are not released under a genuinely free licence, but you can obtain the source code, modify it, and use it for any non-commercial project.

在您的示例中,都无法描述男孩"和表"之间的关系,您将需要更强大的语义推理工具,而且我不确定是否存在类似的东西.

Neither will give you a characterisation for the relation between "boy" and "table" in your example—you will need much more powerful semantic reasoning tools for this, and I am not sure whether something like this exists.

修改

现在再次可以获取源代码了C& C和Boxer,以及一系列模型.

这篇关于从文本中提取关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆