其中HTML DOM解析器在Android上效果最好? [英] Which HTML DOM parser works best on Android?

查看:133
本文介绍了其中HTML DOM解析器在Android上效果最好?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要处理在我的Andr​​oid应用程序的一些HTML网页,我会preFER使用XPath提取相关信息。对于经常J2SE有很多可能的实现方式解析普通的HTML成<一个href="http://developer.android.com/reference/org/w3c/dom/package-summary.html">org.w3c.dom.Document:

I need to process some HTML pages in my Android App and I would prefer to use XPath for extracting the relevant information. For regular J2SE there are a lot of possible implementations for parsing regular HTML into a org.w3c.dom.Document:

  • jTidy
  • TagSoup
  • Jericho
  • NekoHTML
  • HTMLCleaner

(列表可能不完整 - 它已被提取推荐替代JTidy把

(List may be incomplete - it has been extracted from Recommend an alternative to jTidy?)

但它是非常复杂的,以评估是否以及如何好这些库在Android上工作(库的大小,CPU和内存消耗)。

But it is very complicated to estimate if and how good those libraries work on Android (library size, cpu and memory consumption).

根据你的经验 - 什么是你的选择为Android的库

推荐答案

确定,貌似没有人能回答这个问题 - 那么我要检查它自己

OK, looks like no-one can answer that question - then I have to check it myself.

把JTidy

我下载了最新的JTidy把源,编译了他们,并添加创建的jar文件的库来我的Andr​​oid应用程序。有我的应用程序(模拟器和真正的手机)使用JTidy把没有问题。在运行时,把JTidy也工作正常 - 但它似乎不是一个很好的适合有限的Andr​​oid环境 - 它的工作原理很慢。纵观甚至解析〜10KB的HTML文件的logcat输出使垃圾收集工作严重。

I downloaded the latest jTidy sources, compiled them and added the created jar file as library to my Android app. There were no problems using jTidy in my App (emulator and real phone). At runtime jTidy also works fine - but it seems that it is not a good fit for the limited Android environment - it works really slow. Looking at the Logcat output even parsing a ~10kb html file causes the garbage collector to work heavily.

HTMLCleaner

从我的经验HTM​​LCleaner作品也不错Android上;库的大小是比较小的(106KB为V2.2)。然而,解析DOM它创建未如预期 - HTMLCleaner插入例如额外的&LT;跨度&GT; 元素到DOM。通过XPath的EX pressions extrecting信息 - - 如果你想显示为一个HTML文件,但对我的使用情况下,这可能是OK!这是一个不走

From my experience HTMLCleaner works also nice on Android; the library size is relatively small (106KB for v2.2). However the parsed DOM it creates is not as expected - HTMLCleaner inserts for example additional <span> elements into the DOM. This may be OK if you want to display it as an HTML file but for my use case - extrecting information via XPath expressions - this is a no-go!

TagSoup

未测试

杰里科

未测试

NekoHTML

未测试

JSoup

未测试

这篇关于其中HTML DOM解析器在Android上效果最好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆