SAX解析器:忽略特殊字符 [英] SAX parser: Ignoring special characters

查看:134
本文介绍了SAX解析器:忽略特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Xerces来解析我的xml文档。问题是像'& nbsp;'这样的xml转义字符在characters()方法中显示为非转义字符。我需要按原样在characters()方法中获取转义字符。

I'm using Xerces to parse my xml document. The issue is that xml escaped characters like ' ' appear in characters() method as non-escaped ones. I need to get escaped characters inside characters() method as is.

谢谢。

UPD:尝试覆盖我的DefaultHandler后代的resolveEntity方法。从调试中可以看出它被设置为xml阅读器的实体解析器但是没有调用来自重写方法的代码。

UPD: Tried to override resolveEntity method im my DefaultHandler's descendant. Can see from debug that it's set as entity resolver to xml reader but code from overridden method is not invoked.

推荐答案

我认为你的解决方案也不算太糟糕:几行代码可以完全按照您的要求进行操作。
问题是 startHntity endEntity 方法不是由 ContentHandler <提供的/ code> interface,所以你必须写一个 LexicalHandler ,它与你的 ContentHandler 结合使用。
通常,使用 XMLFilter 更优雅,但你必须使用实体,所以你仍然应该写一个 LexicalHandler 。请参阅此处,了解如何使用SAX过滤器。

I think your solution is not too bad: a few lines of code to do exactly what you want. The problem is that startEntity and endEntity methods are not provided by ContentHandler interface, so you have to write a LexicalHandler which works in combination with your ContentHandler. Usually, the use of an XMLFilter is more elegant, but you have to work with entity, so you still should write a LexicalHandler. Take a look here for an introduction to the use of SAX filters.

我想向您展示一种与您非常相似的方式,它允许您将过滤操作分开(包装和放入& amp; 。我基于 XMLFilterImpl 编写了我自己的 XMLFilter ,它还实现了 LexicalHandler 界面。此过滤器仅包含与entites escape / unescape相关的代码。

I'd like to show you a way, very similar to yours, which allows you to separate filtering operations (wrapping & to &amp; for instance) from output operations (or something else). I've written my own XMLFilter based on XMLFilterImpl which also implements LexicalHandler interface. This filter contains only the code related to entites escape/unescape.

public class XMLFilterEntityImpl extends XMLFilterImpl implements
        LexicalHandler {

    private String currentEntity = null;

    public XMLFilterEntityImpl(XMLReader reader)
            throws SAXNotRecognizedException, SAXNotSupportedException {
        super(reader);
        setProperty("http://xml.org/sax/properties/lexical-handler", this);
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        if (currentEntity == null) {
            super.characters(ch, start, length);
            return;
        }

        String entity = "&" + currentEntity + ";";
        super.characters(entity.toCharArray(), 0, entity.length());
        currentEntity = null;
    }

    @Override
    public void startEntity(String name) throws SAXException {
        currentEntity = name;
    }

    @Override
    public void endEntity(String name) throws SAXException {
    }

    @Override
    public void startDTD(String name, String publicId, String systemId)
            throws SAXException {
    }

    @Override
    public void endDTD() throws SAXException {
    }

    @Override
    public void startCDATA() throws SAXException {
    }

    @Override
    public void endCDATA() throws SAXException {
    }

    @Override
    public void comment(char[] ch, int start, int length) throws SAXException {
    }
}

这是我的主要内容, DefaultHandler ContentHandler 根据过滤器代码接收实体:

And this is my main, with a DefaultHandler as ContentHandler which receives the entity as it is according to the filter code:

public static void main(String[] args) throws ParserConfigurationException,
        SAXException, IOException {

    DefaultHandler defaultHandler = new DefaultHandler() {
        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
            //This method receives the entity as is
            System.out.println(new String(ch, start, length));
        }
    };

    XMLFilter xmlFilter = new XMLFilterEntityImpl(XMLReaderFactory.createXMLReader());
    xmlFilter.setContentHandler(defaultHandler);
    String xml = "<html><head><title>title</title></head><body>&amp;</body></html>";
    xmlFilter.parse(new InputSource(new StringReader(xml)));
}

这是我的输出:

title
&amp;

可能你不喜欢它,无论如何这是另一种解决方案。

Probably you don't like it, anyway this is an alternative solution.

对不起,但是 SaxParser 我觉得你没有更优雅的方式。

I'm sorry, but with SaxParser I think you don't have a more elegant way.

你还应该考虑切换到 StaxParser :用很容易做到你想做的事情XMLInputFactory.IS_REPLACING_ENTITY_REFERENCE 设置为false。如果你喜欢这个解决方案,你应该看看这里

You should also consider switching to StaxParser: it's very easy to do what you want with XMLInputFactory.IS_REPLACING_ENTITY_REFERENCE set to false. If you like this solution, you should take a look here.

这篇关于SAX解析器:忽略特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆