解析GATE文档以获取共同参考文本 [英] Parse GATE Document to get Co-Reference Text

查看:98
本文介绍了解析GATE文档以获取共同参考文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建用于查找共同参考文字的GATE应用.它工作正常,我已经通过GATE中提供的导出选项创建了应用程序的压缩文件.

I'm creating a GATE app which used to find co-reference text. It works fine and I have created zipped file of the app by export option provided in GATE.

现在,我正在尝试在Java代码中使用相同的代码.

Now I'm trying to use the same in my Java code.

    Gate.runInSandbox(true);
    Gate.setGateHome(new File(gateHome));
    Gate.setPluginsHome(new File(gateHome, "plugins"));
    Gate.init();
    URL applicationURL = new URL("file:" + new Path(gateHome, "application.xgapp").toString());

    application = (CorpusController) PersistenceManager.loadObjectFromUrl(applicationURL);
    corpus = Factory.newCorpus("Megaki Corpus");
    application.setCorpus(corpus);

    Document document = Factory.newDocument(text);

    corpus.add(document);
    application.execute();
    corpus.clear();

现在如何解析此文档并获得共同引用文本?

Now how can I parse this document and get co-reference text?

推荐答案

我不了解您的信息,但是使用共同引用编辑器"手动创建的共同引用存储在文档功能中.功能名称似乎是"MatchesAnnots",类型是Map<String, List<List<Integer>>>.

I do not know about yours, but co-references created manually using the Co-reference Editor are stored in a document feature. The feature name seems to be "MatchesAnnots" and the type Map<String, List<List<Integer>>>.

就我而言,以下代码将打印as name: null(默认注释集),然后打印其中的所有共同引用链.

In my case, following code prints as name: null (the default annotation set) followed by all co-reference chains present in it.

Object obj = document.getFeatures().get("MatchesAnnots");

@SuppressWarnings("unchecked")
Map<String, List<List<Integer>>> map = (Map<String, List<List<Integer>>>) obj;

for (Entry<String, List<List<Integer>>> e : map.entrySet()) {
    System.err.println("as name: "+  e.getKey());
    for (List<Integer> chain : e.getValue()) {
        System.err.println("chain : "+  chain);         
    }
}

这篇关于解析GATE文档以获取共同参考文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆