在GATE中解析字体样式或段落块 [英] Parsing either font style or block of paragraph in GATE

查看:107
本文介绍了在GATE中解析字体样式或段落块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Word文档.我需要使用GATE匹配特定的表部分或标题部分.我想,是否有任何步骤可让我们首先检查标题的任何字体大小或字体样式,然后匹配其余内容,直到重复下一个标题模式.

I have a word document. I need to match particular table section or heading section of it using GATE. I thought if there were any steps from where we can first check any font size or font style of the heading and then match rest of the content till next heading pattern repeats.

推荐答案

GATE 仅对 Apache Tika Apache POI 库.我不知道有任何免费的替代方案...为此,我们在我的公司中开发了自己的插件(gate.DocumentFormat),但目前无法在外部使用.

GATE has only a limited support for MS Word documents provided by the Apache Tika and Apache POI libraries. I do not know about any free alternative... We have developed our own plugin (gate.DocumentFormat) for this purpose in my company, but it is not available for the outside by now.

您可以尝试通过其他一些工具(例如,直接使用MS Word,OpenOffice, docx4j 或其他-请尝试使用Google docx to html-您会看到很多结果),然后改为在GATE中处理HTML文档.您将在Original markups注释集中看到所有可用格式.

You can try to convert your word documents to HTML by some other tool (e.g. using directly the MS Word, OpenOffice, docx4j or others - try google docx to html -- you will see many results) and then process the HTML documents in GATE instead. You will see all the formatting available in the Original markups annotation set.

这篇关于在GATE中解析字体样式或段落块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆