在这里使用哪个 XML 解析器? [英] Which XML parser to use here?

查看:44
本文介绍了在这里使用哪个 XML 解析器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在接收一个 XML 文件作为输入,其大小可以从几 KB 到更多.我正在通过网络获取此文件.我需要根据我的使用提取少量节点,所以大部分文档对我来说毫无用处.我没有记忆偏好,我只需要速度.

I am receving an XML file as an input, whose size can vary from a few KBs to a lot more. I am getting this file over a network. I need to extract a small number of nodes as per my use, so most of the document is pretty useless for me. I have no memory preferences, I just need speed.

考虑到这一切,我得出结论:

Considering all this, I concluded :

  1. 这里不使用 DOM(由于 doc 可能很大,没有 CRUD 要求,并且来源是网络)

  1. Not using DOM here (due to possible huge size of doc , no CRUD requirement, and source being network)

没有 SAX,因为我只需要获取一小部分数据.

No SAX as I only need to get a small subset of data.

StaX 可能是一种方法,但我不确定它是否是最快的方法.

StaX can be a way to go, but I am not sure if it is the fastest way.

JAXB 是另一种选择——但它使用什么样的解析器?我读到它默认使用 Xerces(这是什么类型 - 推或拉?),尽管我可以按照这个 链接p>

JAXB came up as another option - but what sort of parser does it use ? I read it uses Xerces by default (which is what type - push or pull ?), although I can configure it for use with Stax or Woodstock as per this link

我读了很多书,仍然对这么多选项感到困惑!任何帮助将不胜感激.

I am reading a lot, still confused with so many options ! Any help would be appreciated.

谢谢!

我想在这里再添加一个问题:在这里使用 JAXB 有什么问题?

Edit : I want to add one more question here : What is wrong in using JAXB here ?

推荐答案

目前最快的解决方案是 StAX 解析器,特别是因为您只需要 XML 文件的特定子集,并且您可以轻松地忽略任何不需要使用的东西StAX,而如果您使用 SAX 解析器,无论如何您都会收到该事件.

Fastest solution is by far a StAX parser, specially as you only need a specific subset of the XML file and you can easily ignore whatever isn't really necessary using StAX, while you would receive the event anyway if you were using a SAX parser.

但它也比使用 SAX 或 DOM 稍微复杂一些.有一天,我不得不为以下 XML 编写 StAX 解析器:

But it's also a little bit more complicated than using SAX or DOM. One of these days I had to write a StAX parser for the following XML:

<?xml version="1.0"?>
<table>
    <row>
        <column>1</column>
        <column>Nome</column>
        <column>Sobrenome</column>
        <column>email@gmail.com</column>
        <column></column>
        <column>2011-06-22 03:02:14.915</column>
        <column>2011-06-22 03:02:25.953</column>
        <column></column>
        <column></column>
    </row>
</table>    

以下是最终解析器代码的样子:

Here's how the final parser code looks like:

public class Parser {

private String[] files ;

public Parser(String ... files) {
    this.files = files;
}

private List<Inscrito> process() {

    List<Inscrito> inscritos = new ArrayList<Inscrito>();


    for ( String file : files ) {

        XMLInputFactory factory = XMLInputFactory.newFactory();

        try {

            String content = StringEscapeUtils.unescapeXml( FileUtils.readFileToString( new File(file) ) );

            XMLStreamReader parser = factory.createXMLStreamReader( new ByteArrayInputStream( content.getBytes() ) );

            String currentTag = null;
            int columnCount = 0;
            Inscrito inscrito = null;           

            while ( parser.hasNext() ) {

                int currentEvent = parser.next();

                switch ( currentEvent ) {
                case XMLStreamReader.START_ELEMENT: 

                    currentTag = parser.getLocalName();

                    if ( "row".equals( currentTag ) ) {
                        columnCount = 0;
                        inscrito = new Inscrito();                      
                    }

                    break;
                case XMLStreamReader.END_ELEMENT:

                    currentTag = parser.getLocalName();

                    if ( "row".equals( currentTag ) ) {
                        inscritos.add( inscrito );
                    }

                    if ( "column".equals( currentTag ) ) {
                        columnCount++;
                    }                   

                    break;
                case XMLStreamReader.CHARACTERS:

                    if ( "column".equals( currentTag ) ) {

                        String text = parser.getText().trim().replaceAll( "
" , " "); 

                        switch( columnCount ) {
                        case 0:
                            inscrito.setId( Integer.valueOf( text ) );
                            break;
                        case 1:                         
                            inscrito.setFirstName( WordUtils.capitalizeFully( text ) );
                            break;
                        case 2:
                            inscrito.setLastName( WordUtils.capitalizeFully( text ) );
                            break;
                        case 3:
                            inscrito.setEmail( text );
                            break;
                        }

                    }

                    break;
                }

            }

            parser.close();

        } catch (Exception e) {
            throw new IllegalStateException(e);
        }           

    }

    Collections.sort(inscritos);

    return inscritos;

}

public Map<String,List<Inscrito>> parse() {

    List<Inscrito> inscritos = this.process();

    Map<String,List<Inscrito>> resultado = new LinkedHashMap<String, List<Inscrito>>();

    for ( Inscrito i : inscritos ) {

        List<Inscrito> lista = resultado.get( i.getInicial() );

        if ( lista == null ) {
            lista = new ArrayList<Inscrito>();
            resultado.put( i.getInicial(), lista );
        }

        lista.add( i );

    }

    return resultado;
}

}

代码本身是葡萄牙语,但你应该很容易理解它是什么,这里是github上的repo.

The code itself is in portuguese but it should be straightforward for you to understand what it is, here's the repo on github.

这篇关于在这里使用哪个 XML 解析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆