如何创建Java nlp和ruta脚本管道? [英] How to create pipeline of java nlp and ruta scripts?

查看:110
本文介绍了如何创建Java nlp和ruta脚本管道?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个Maven项目,该项目动态执行一些ruta脚本来注释一些标签并在Java中处理输出.

I'm working on a Maven project which dynamically executes some ruta scripts to annotate some tags and process the output in java.

现在,我想先使用NLP(主要是dkpro),然后将输出传递到ruta脚本(管道)并进行进一步处理.如何实现呢?

Now that I want to use NLP (mostly dkpro) first and then pass the output to the ruta scripts (pipeline) and process further. How to achieve it ?

下面是我的新脚本;

    AnalysisEngineDescription pipeline = createEngineDescription(createEngineDescription(OpenNlpSegmenter.class),
            createEngineDescription(OpenNlpPosTagger.class),
            AnalysisEngineFactory.createEngineDescription(RutaEngine.class, RutaEngine.PARAM_MAIN_SCRIPT,
                    "com.textjuicer.ruta.date.Author_updated"),
            createEngineDescription(ConsoleWriter.class));

错误:

无法解析类型:参考

2016年5月25日下午6:45:43 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl processAndOutputNewCASes(273) 严重:发生异常 org.apache.uima.analysis_engine.AnalysisEngineProcessException:注释器处理失败.
在org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) 在org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) 在org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378)上 在org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes处(PrimitiveAnalysisEngine_impl.java:298) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568)处 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.(ASB_impl.java:410) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) 在org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568)处 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.(ASB_impl.java:410) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) 在org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) 在org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) 在org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) 在org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) 在com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) 在com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) 在com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) 引起原因:java.lang.IllegalArgumentException:无法解析类型:Reference 在org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) 在org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) 在org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) 在org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) 在org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) 在org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ...还有17个

May 25, 2016 6:45:43 PM org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl processAndOutputNewCASes(273) SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) at com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) at com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) Caused by: java.lang.IllegalArgumentException: Not able to resolve type: Reference at org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) at org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) at org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ... 17 more

线程主"中的异常org.apache.uima.analysis_engine.AnalysisEngineProcessException:批注程序处理失败. 在org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) 在org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) 在org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378)上 在org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes处(PrimitiveAnalysisEngine_impl.java:298) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568)处 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.(ASB_impl.java:410) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) 在org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568)处 在org.apache.uima.analysis_engine.asb.impl.ASB_impl $ AggregateCasIterator.(ASB_impl.java:410) 在org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) 在org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) 在org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) 在org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) 在org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) 在com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) 在com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) 在com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) 引起原因:java.lang.IllegalArgumentException:无法解析类型:Reference 在org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) 在org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) 在org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) 在org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) 在org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) 在org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ...还有17个

Exception in thread "main" org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) at com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) at com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) Caused by: java.lang.IllegalArgumentException: Not able to resolve type: Reference at org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) at org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) at org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ... 17 more

推荐答案

您可以在DKPro Pipeline的末尾简单地将Ruta脚本添加为分析引擎.确切的代码主要取决于您如何构建和运行管道.

You can add Ruta script simply as an analysis engine at the end of your DKPro Pipeline. The exact code mainly depends on how you build and run your pipeline.

改编自uimafit 文档:

Adapted from the uimafit documentation:

// your collecton reader
CollectionReaderDescription reader = 
  CollectionReaderFactory.createReaderDescription(
    TextReader.class, 
    TextReader.PARAM_INPUT, "/home/uimafit/documents");

// some DKPro Code component
AnalysisEngineDescription dkpro= 
  AnalysisEngineFactory.createEngineDescription(
    Tokenizer.class);

AnalysisEngineDescription ruta = 
  AnalysisEngineFactory.createEngineDescription(
    RutaEngine.class, 
    RutaEngine.PARAM_MAIN_SCRIPT, "Main.ruta");

// some writer
AnalysisEngineDescription writer= 
  AnalysisEngineFactory.createEngineDescription(
    XmiWriter.class, 
    XmiWriter.PARAM_OUTPUT, "/home/uimafit/output");

SimplePipeline.runPipeline(reader, dkpro, ruta, writer);

通过使用uimaFIT工厂,可以通过指定mainScript参数或通过使用PARAM_RULES直接配置规则来创建Ruta脚本的分析引擎.您还可以使用Ruta脚本的xml描述符创建分析引擎.

You can create an analysis engine of your Ruta script by using the uimaFIT factories by either specifying the mainScript parameter or by directly configuring the rules with PARAM_RULES. You can also use the xml descriptor of the Ruta script to create the analysis engine.

如果ruta脚本声明了新类型,则必须使用xml描述符来创建分析引擎,或者需要通过脚本的生成的类型系统来扩展uimaFIT的types.txt文件. (...,或者需要以其他方式包含类型系统.)

If the ruta script declares new types, then either the xml descriptor has to be used to create the analysis engine, or the types.txt file of uimaFIT needs to be extended by the generated type system of the script. (... or the type system need to be included in some other way.)

如果ruta脚本导入并调用其他脚本,则需要使用生成的描述符,或者需要正确设置相应的参数,例如AdditionalScripts.导入的分析引擎也是如此.

If the ruta script imports and calls other scripts, then the generated descriptor need to be used, or the corresponding parameters need to be set correctly, e.g., additionalScripts. Same is true for imported analysis engines.

如果您在Ruta脚本中导入NLP/DKPro类型系统,则可以使用DKPro批注简单地编写规则.

If you import the NLP/DKPro typesystem in your Ruta script, then you can simply write rules using the DKPro annotations.

(我是UIMA Ruta的开发人员)

这篇关于如何创建Java nlp和ruta脚本管道?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆