如何在HTML中使用JAXB? [英] How to use JAXB with HTML?
问题描述
我想使用JAXB将一些令人讨厌的HTML解组为Java对象。 (我在使用Java 7)。
I would like to unmarshall some nasty HTML to a Java object using JAXB. (I'm on Java 7).
Tagsoup是一个符合SAX标准的XML解析器,可以处理令人讨厌的HTML。
Tagsoup is a SAX-compliant XML parser that can handle nasty HTML.
如何设置JAXB以使用Tagsoup来解组HTML?
How can I setup JAXB to use Tagsoup for unmarshalling HTML?
我尝试设置System.setProperty(org.xml.sax.driver,org。 ccil.cowan.tagsoup.Parser);
I tried setting System.setProperty("org.xml.sax.driver", "org.ccil.cowan.tagsoup.Parser");
如果我创建XMLReader,它使用Tagsoup,但不是在我使用JAXB时。
If I create an XMLReader, it uses Tagsoup, but not when I use JAXB.
-
com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl是否使用DOM或SAX来解析XML?
Does com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl use DOM or SAX for parsing XML?
如何判断JAXB使用SAX?
How can I tell JAXB to use SAX?
如何判断JAXB使用TagSoup作为SAX实现?
How can I tell JAXB to use TagSoup as it's SAX implementation?
按照Blaise的建议,尝试下面,但在最后一行得到SAXParseException。仅使用XMLReader完成解析:
As per Blaise's suggesting, tried below, but getting SAXParseException on the last line. The parse is fine when done with the XMLReader only:
JAXBContext jaxbContext = JAXBContext.newInstance(Thing.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
XMLReader xmlReader = new org.ccil.cowan.tagsoup.Parser();
xmlReader.parse("file:///c:/test.xml");
System.out.println("parse ok");
xmlReader.setContentHandler(unmarshaller.getUnmarshallerHandler());
//SAXParseException; systemId: file:/c:/test.xml; lineNumber: 5; columnNumber: 3; The element type "br" must be terminated by the matching end-tag "</br>".
Thing thing = (Thing) unmarshaller.unmarshal(new File("c:/test.xml"));
推荐答案
你可以得到一个 UnmarshallerHandler
来自 Unmarshaller
并在SAX解析器上将其设置为 ContentHandler
。在执行SAX解析后,从 UnmarshallerHandler
获取对象。
You can get an UnmarshallerHandler
from an Unmarshaller
and set that as the ContentHandler
on your SAX parser. After you do the SAX parse obtain the object from the UnmarshallerHandler
.
UnmarshallerHandler unmarshallerHandler = unmarshaller.getUnmarshallerHandler();
xmlReader.setContentHandler(unmarshallerHandler);
xmlReader.parse(...);
Thing thing = (Thing) unmarshallerHandler.getResult();
我的博客上有一个例子:
There is an example of this on my blog:
- http://blog.bdoughan.com/2011/05/jaxb-and-dtd.html
这篇关于如何在HTML中使用JAXB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!