iOS:结合使用SAX和DOM解析 [英] iOS: Combining SAX and DOM parsing

查看:100
本文介绍了iOS:结合使用SAX和DOM解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在一个iPad项目上,需要将大型XML文件处理到SQLite后端中.我目前正在使用 TBXML 解析器进行这项工作.

I am currently working on an iPad project for which I need to process large XML file into an SQLite backend. I currently have this working using the TBXML parser.

因此所有逻辑都已准备就绪,通常TBXML解析器会完成所需的工作.我现在遇到的唯一问题是XML文件太大,并且我的内存不足.因此,我考虑切换到像Alan Quatermain的

So all the logic is in place and in general the TBXML parser does the job it needs to do. Only problem I'm now encountering is that the XML files are getting too big and I am running out of memory. Because of this I thinking of switching to a SAX parser like the core NSXMLParser of something like Alan Quatermain's AQXMLParser. However this will require me to redo all of my current logic that to some extent relies on functions provided by a DOM tree. This is something I'd rather not do.

所以我想尝试做的是创建一种混合方法.给定我的XML结构,这应该是可能的.基本上是许多重复的记录"元素.每个记录中都有可以重复和嵌套的各种元素. 在我目前的方法中,我解析文档并将每个记录元素传递给将其处理到数据库中的函数.既然已经存在,我想在我的混合解析方法中使用它.

So what I want to try and do is create a hybrid approach. Given my XML structure this should be possible. It's basically a number of repeating "Record" elements. And within each record are various elements that can be repeating and nested. In my current approach I parse the document and pass each record element to a function that processes it into the database. Given that this already exists I want to use this in my hybrid parsing approach.

这是我要实现的目标.使用SAX解析器,我遍历了文档.在遍历文档时,我建立了一个Record元素.每当我完成一个记录元素时,我都会将其传递给使用TBXML对其进行处理的现有函数.然后,SAX解析器将继续构建下一个记录元素.主要目标是: -修复内存占用量(它不需要到可能的最小,但必须恒定或至少比使用TBXML小) -保持性能可接受.

This is what I want to achieve. Using a SAX parser I traverse my document. While traversing the document I build a Record element. Whenever I complete a record element I pass it along to the existing function that uses TBXML to process it. The SAX parser then continues to build the next record element. Key goals are to: - Fix the memory footprint (it doesn't need to the smallest it can be, but it has to be contstant or at least smaller that using TBXML) - Keep performance acceptable.

当前要实现如下:

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict{
    //Recreate record string each time record element is encountered
    if([elementName isEqualToString:@"Record"]) record = [[NSMutableString alloc] init];
    //Write XML tag with name
    [record appendFormat@"<%@>, elementName];
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string{
    //Write XML content
    [record appendString:string];
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName{
    //Write XML tag
    [record appendFormat@"</%@>, elementName];
    if([elementName isEqualToString:@"Record"]){
        //Parse record string into TBXML object
        TBXML * tbxmlRecord = [TBXML tbxmlWithXMLString:record];
        //Send it to the TBXML record processor
        [self processElement:tbxmlRecord.rootXMLElement];
    }
}

我认为这应该起作用,但是使用字符串感觉很脏.此外,我担心解析器到达新的记录元素时记录字符串是否不会太快被覆盖.

I think this should work but it feels dirty to use a string. Furthermore I have my concerns on if the record string won't get overwritten too soon when the parser reaches a new record element.

所以我的问题是,这是否是解决问题的合理方法,或者是否有更好的方法可以帮助我实现所需的目标?

So my question is, if this is a sound way to approach this or if there are better ways for me to achieve what I'm looking for?

我已经实现了这种方法,并且看起来效果很好.我遇到的唯一麻烦是,如果我的源文件不是UTF-8编码的,我只会得到部分结果.但是,当我纠正时,一切顺利.内存使用并没有那么好.但也许需要它能做的.需要运行更多测试...

I've implemented this approach and it looks to working quite well. Only hiccup I've encountered is that if my source file isn't UTF-8 encoded I only get a partial result. But when I correct this all goes well. Memory usage isn't that much better though. But maybe it takes what it can. Need to run more tests...

推荐答案

通常,您的方法对我来说还不错.如果您的解决方案为您工作而不会出现性能问题,那么我就不会太担心字符串处理.如果需要,您可以对应用程序进行配置文件,以了解这样做浪费了多少CPU时间.

In general your approach sounds fine to me. If your solution is working for you without performance problems then I wouldn't be too worried about the string handling. If you want to you can profile your application to see how much CPU time is wasted by this.

如果您想做一些更优化的事情,可以尝试找到一个SAX解析器,该解析器为您提供原始缓冲区的字节偏移量,并将其与DOM解析器结合使用,该解析器使您可以使用非空终止的C字符串.我相信这意味着您必须切换到C或C ++库.我已经使用 rapidxml 进行类似于您尝试的操作(嵌入大文件中的xml块).

If you want to do something slightly more optimized, you could try to find a SAX parser that gives you the byte offsets of the original buffer and combine this with a DOM parser that lets you work with non null-terminated C strings. I would believe this means you have to switch to a C or maybe C++ library. I have used rapidxml for something vaguely similar to what you are trying (xml chunks embedded in huge file).

这篇关于iOS:结合使用SAX和DOM解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆