通过Java提取两个环节之间的文本在HTML [英] Extract text between two links in HTML through Java

查看：128 发布时间：2016/2/23 11:18:54 java android xml parsing epub

本文介绍了通过Java提取两个环节之间的文本在HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试从使用Java的ePub文件中的文本数据。 ePub文件的文本在于被格式化像这样一个HTML文件中 -

I am trying to retrieve the text data from an ePub file using Java. The text of the ePub file lies within a HTML file that is formatted something like this -

<h2 id="pgepubid00001">Chapter I</h2>

<p>Some text</p>
<p>Another line of Text</p>

<br/>

<h2 id="pgepubid00001">Chapter II</h2>

etc..

打开此文件我已经知道我需要提取并能找到下一个章节的ID太章的ID前面。正因为如此我想到了一个合乎逻辑的做法是试图解析它在SAX解析器，并提取每个段落中的文本，直到我到达下一个章节的链接。但是，这被证明是一项艰巨的任务。

Before opening this file I already know the id of the Chapter I need to extract and can find the id of the next chapter too. Because of this I thought a logical approach would be to attempt to parse it in a SAX parser and extract the text in each paragraph until I reached the link of the next chapter. But this is proving quite a task.

当然，一切都是动态的，所以没有设置链接转到等HTML是半严格的格式，所以我没想到解析到这么多的问题。谁能推荐一个很好的方法来提取所需要的文字？

Of course, everything is dynamic so there is no set link to go to etc. The HTML is semi-strictly formatted so I didn't expect parsing to be so much of a problem. Can anyone recommend a good way to extract the text needed?

该解决方案需要的 JAVA ONLY ，可以使用任何其他语言。我期待在Android设备来实现这个

The solution needs to be JAVA ONLY, no other languages can be used. I am looking to implement this in an Android device

通过Java提取两个环节之间的文本在HTML [英] Extract text between two links in HTML through Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

通过Java提取两个环节之间的文本在HTML [英] Extract text between two links in HTML through Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭