使用SAX获取DOCTYPE详细信息(JDK 7) [英] Obtaining DOCTYPE details using SAX (JDK 7)

查看:207
本文介绍了使用SAX获取DOCTYPE详细信息(JDK 7)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用JDK7附带的SAX解析器。我试图获取DOCTYPE声明,但 DefaultHandler 中的所有方法似乎都没有被触发。我缺少什么?

I'm using the SAX parser that comes with JDK7. I'm trying to get hold of the DOCTYPE declaration, but none of the methods in DefaultHandler seem to be fired for it. What am I missing?

import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class Problem {

    public static void main(String[] args) throws Exception {
        String xml = "<!DOCTYPE HTML><html><head></head><body></body></html>";
        SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
        InputSource in = new InputSource(new StringReader(xml));
        saxParser.parse(in, new DefaultHandler() {

            @Override
            public void startElement(String uri, String localName, String qName,
                    Attributes attributes) throws SAXException {
                System.out.println("Element: " + qName);
            }
        });;
    }
}

这会产生:

Element: html
Element: head
Element: body

希望生成:

DocType: HTML
Element: html
Element: head
Element: body

如何获取DocType?

How do I get the DocType?

更新:看起来有一个 DefaultHandler2 要扩展的类。我可以将其作为替代品吗?

Update: Looks like there's a DefaultHandler2 class to extend. Can I use that as a drop-in replacement?

推荐答案

而不是 DefaultHander ,使用 org.xml.sax.ext.DefaultHandler2 ,其中包含 startDTD()方法。

Instead of a DefaultHander, use org.xml.sax.ext.DefaultHandler2 which has the startDTD() method.


报告DTD声明的开始(如果有)。这个方法的目的是
来报告DOCTYPE声明的开头;如果文档
没有DOCTYPE声明,则不会调用此方法。

Report the start of DTD declarations, if any. This method is intended to report the beginning of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.

通过DTDHandler或DeclHandler事件报告的所有声明
必须出现在startDTD和endDTD事件。声明是假定属于内部DTD子集的
,除非它们在startEntity和endEntity事件之间出现
。还应在startDTD
和endDTD事件之间以(逻辑)发生的原始顺序报告来自DTD的注释和处理
指令;但是,
它们不需要出现在相对于
DTDHandler或DeclHandler事件的正确位置。

All declarations reported through DTDHandler or DeclHandler events must appear between the startDTD and endDTD events. Declarations are assumed to belong to the internal DTD subset unless they appear between startEntity and endEntity events. Comments and processing instructions from the DTD should also be reported between the startDTD and endDTD events, in their original order of (logical) occurrence; they are not required to appear in their correct locations relative to DTDHandler or DeclHandler events, however.

请注意,start / endDTD事件将是出现在ContentHandler的
start / endDocument事件中以及第一个
startElement事件之前。

Note that the start/endDTD events will appear within the start/endDocument events from ContentHandler and before the first startElement event.

但是,你还必须为XML Reader设置LexicalHandler。

However, you must also set the LexicalHandler for the XML Reader.

import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;

public class Problem{

    public static void main(String[] args) throws Exception {
        String xml = "<!DOCTYPE html><hml><img/></hml>";
        SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
        InputSource in = new InputSource(new StringReader(xml));

        DefaultHandler2 myHandler = new DefaultHandler2(){
            @Override
            public void startElement(String uri, String localName, String qName,
                    Attributes attributes) throws SAXException {
                System.out.println("Element: " + qName);
            }

            @Override
            public void startDTD(String name,  String publicId,
            String systemId) throws SAXException {
                System.out.println("DocType: " + name);
            }
        };
        saxParser.setProperty("http://xml.org/sax/properties/lexical-handler",
                               myHandler);
        saxParser.parse(in, myHandler);
    }
}

这篇关于使用SAX获取DOCTYPE详细信息(JDK 7)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆