使用Jsoup解析XML [英] Parsing XML with Jsoup

查看：282 发布时间：2019/1/2 22:44:29 java xml jsoup

本文介绍了使用Jsoup解析XML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到以下代表新闻文章的XML：

I get the following XML which represents a news article:

<content>
   Some text blalalala
   <h2>Small subtitle</h2>
   Some more text blbla
   <ul class="list">
      <li>List item 1</li>
      <li>List item 2</li>
   </ul>
   <br />
   Even more freakin text
</content>

我知道格式不理想，但现在我必须接受它。

I know the format isn't ideal but for now I have to take it.

该文章应如下所示：

一些文字blalalala

小字幕

带项目的列表

更加怪异的文字

Some text blalalala
Small subtitle
List with items
Even more freakin text

我用Jsoup解析这个XML。我可以使用 doc.ownText（）在< content> 标记内获取文字，但后来我不知道放置其他东西（副标题）的地方，我只得到一个大的字符串。

I parse this XML with Jsoup. I can get the text within the <content> tag with doc.ownText() but then I have no idea where the other stuff (subtitle) is placed, I get only one big String.

它会更好吗？使用基于事件的解析器（我讨厌它们:(）或者是否有可能做类似 doc.getTextUntilTagAppears（tagName）？

Would it be better to use an event based parser for this (I hate them :() or is there a possibility to do something like doc.getTextUntilTagAppears("tagName")?

编辑：为了澄清，我知道在< content> 下获取元素很热，我的问题是获取< content> ，每次被元素打断时都会被分解。

For clarification, I know hot to get the elements under <content>, my problem is with getting the text within <content>, broken up every time when its interrupted by an element.

我知道我可以获得所有内容中的文字 .textNodes（），效果很好，但是我又不知道我的文章中哪个文本节点属于哪一个（h2之前的顶部一个，另一个在底部）。

I learned that I can get all the text within content with .textNodes(), works great, but then again I don't know where which text node belongs in my article (one at the top before h2, the other one at the bottom).

使用Jsoup解析XML [英] Parsing XML with Jsoup

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用Jsoup解析XML [英] Parsing XML with Jsoup

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭