开源的基于规则的模式匹配/信息提取框架? [英] Open-source rule-based pattern matching / information extraction frameworks?
问题描述
我正在购买一个开放源代码框架,用于编写自然语言语法规则以对注释进行模式匹配.您可以将其视为正则表达式,但在令牌而不是字符级别进行匹配.这种框架应使匹配标准能够引用附加到输入标记或跨度的其他属性,并在操作中修改此类属性.
I'm shopping for an open-source framework for writing natural language grammar rules for pattern matching over annotations. You could think of it like regexps but matching at the token rather than character level. Such a framework should enable the match criteria to reference other attributes attached to the input tokens or spans, as well as modify such attributes in an action.
我知道有三个选项符合以下描述:
There are three options I know of which fit this description:
- GATE Java Expressions over Annotations (JAPE)
- Stanford CoreNLP's TokensRegex
- UIMA Ruta (Tutorial)
- Graph Expression (GExp)*
目前还有其他可用的选项吗?
相关工具
- While I know that general parser generators like Antlr can also serve this purpose, I'm looking for something which are more specifically tailored for natural language processing or information extraction.
- UIMA includes a Regex Annotator plugin for declaring rules in XML, but appears to operate at the character rather than high-level objects.
- I know that this kind of task is often performed with statistical models, but for narrow, structured domains there's benefit in hand-crafting rules.
*使用GExp时,规则"实际上是在代码中实现的,但是由于选项很少,所以我选择将其包括在内.
* With GExp 'rules' are actually implemented in code but since there are so few options I chose to include it.
推荐答案
French academic soft Unitex from University Paris East also matches your description (http://www-igm.univ-mlv.fr/~unitex/)
它基于C ++,包含许多可选的预处理规则和适用于20多种语言的词典.
It's C++ based, comprises many optional preprocessing rules and lexicons for 20+ languages.
GUI是基于图的(您可以设计自动机,即语法").
The GUI is graph based (you design automata ie 'grammars').
这篇关于开源的基于规则的模式匹配/信息提取框架?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!