使用Jsoup解析XML [英] Parsing XML with Jsoup

查看:282
本文介绍了使用Jsoup解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到以下代表新闻文章的XML:

I get the following XML which represents a news article:

<content>
   Some text blalalala
   <h2>Small subtitle</h2>
   Some more text blbla
   <ul class="list">
      <li>List item 1</li>
      <li>List item 2</li>
   </ul>
   <br />
   Even more freakin text
</content>

我知道格式不理想,但现在我必须接受它。

I know the format isn't ideal but for now I have to take it.

该文章应如下所示:


  • 一些文字blalalala

  • 小字幕

  • 带项目的列表

  • 更加怪异的文字

  • Some text blalalala
  • Small subtitle
  • List with items
  • Even more freakin text

我用Jsoup解析这个XML。我可以使用 doc.ownText()< content> 标记内获取文字,但后来我不知道放置其他东西(副标题)的地方,我只得到一个大的字符串

I parse this XML with Jsoup. I can get the text within the <content> tag with doc.ownText() but then I have no idea where the other stuff (subtitle) is placed, I get only one big String.

它会更好吗?使用基于事件的解析器(我讨厌它们:()或者是否有可能做类似 doc.getTextUntilTagAppears(tagName)

Would it be better to use an event based parser for this (I hate them :() or is there a possibility to do something like doc.getTextUntilTagAppears("tagName")?

编辑:为了澄清,我知道在< content> 下获取元素很热,我的问题是获取< content> ,每次被元素打断时都会被分解。

For clarification, I know hot to get the elements under <content>, my problem is with getting the text within <content>, broken up every time when its interrupted by an element.

我知道我可以获得所有内容中的文字 .textNodes(),效果很好,但是我又不知道我的文章中哪个文本节点属于哪一个(h2之前的顶部一个,另一个在底部)。

I learned that I can get all the text within content with .textNodes(), works great, but then again I don't know where which text node belongs in my article (one at the top before h2, the other one at the bottom).

推荐答案

我犯的错误是通过元素浏览XML ,不包括 TextNodes 。当我按节点点击它时,我可以检查节点元素还是 TextNode ,这样我可以相应地对待它们。

The mistake I made was going through the XML by Elements, which do not include TextNodes. When I go through it Node by Node, I can check wether the Node is an Element or a TextNode, that way I can treat them accordingly.

这篇关于使用Jsoup解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆