如何提取odt文件的内容? [英] how to extract the contents of an odt file?
问题描述
嗨朋友们,
我正在尝试提取ODT文件的内容以进行索引。
让我详细说明。
以下是我提取odt文件内容的步骤:
步骤
1 - 转换把odt文件放到一个临时的zip文件中。
2 - 循环访问里面的文件并检索''content.xml''文件。
3 - 实际的内容ODT文件驻留在名为< text:p>
4的xml元素中 - 索引从< text:p>检索的内容
我在第3步遇到了麻烦。
我没有content.xml的架构。只有模式,我才能生成元素的相应java类。
请指导我
< blockquote>你的程序的哪个部分有问题?
koolshiva写道:但它不行。
嘿朋友,
我找到了另一种选择。我现在使用SAX而不是JAXB。我已经有了这个选项,但由于性能原因,我个人更喜欢JAXB。
Hi friends,
I am trying to extract the contents of ODT files for indexing.
Let me elaborate.
The following are the steps i follow to extract the contents of the odt file:
Steps
1 - convert the odt file into a temporary zip file.
2 - loop thru the files inside and retrieve the ''content.xml'' file.
3 - the actual content of the ODT file resides in an xml element called <text:p>
4 - index the contents retrieved from <text:p>
I am having trouble in step 3.
I do not have the content.xml''s schema. Only with the schema, i can generate the respective java classes of the elements.
Pls guide me
And which part of your program are you having trouble with?
koolshiva wrote:But it doesn''t work.
Sorry, but that really does not help anyone to guess what might be wrong. Take a look at this article[^] for guidance on reading XML data.
Hey friends,
I have found an alternative. I am using SAX instead of JAXB now. I already had this option, but i personally preferred JAXB owing to performance.
这篇关于如何提取odt文件的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!