在GATE中解析字体样式或段落块 [英] Parsing either font style or block of paragraph in GATE
问题描述
我有一个Word文档.我需要使用GATE匹配特定的表部分或标题部分.我想,是否有任何步骤可让我们首先检查标题的任何字体大小或字体样式,然后匹配其余内容,直到重复下一个标题模式.
I have a word document. I need to match particular table section or heading section of it using GATE. I thought if there were any steps from where we can first check any font size or font style of the heading and then match rest of the content till next heading pattern repeats.
推荐答案
GATE 仅对 Apache Tika 和 Apache POI 库.我不知道有任何免费的替代方案...为此,我们在我的公司中开发了自己的插件(gate.DocumentFormat
),但目前无法在外部使用.
GATE has only a limited support for MS Word documents provided by the Apache Tika and Apache POI libraries. I do not know about any free alternative... We have developed our own plugin (gate.DocumentFormat
) for this purpose in my company, but it is not available for the outside by now.
您可以尝试通过其他一些工具(例如,直接使用MS Word,OpenOffice, docx4j 或其他-请尝试使用Google docx to html
-您会看到很多结果),然后改为在GATE中处理HTML文档.您将在Original markups
注释集中看到所有可用格式.
You can try to convert your word documents to HTML by some other tool (e.g. using directly the MS Word, OpenOffice, docx4j or others - try google docx to html
-- you will see many results) and then process the HTML documents in GATE instead. You will see all the formatting available in the Original markups
annotation set.
这篇关于在GATE中解析字体样式或段落块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!