使用Libxml2解析XML时占用大量RAM [英] Large RAM usage when parsing XML using Libxml2

查看:142
本文介绍了使用Libxml2解析XML时占用大量RAM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用URLSessionDataTask从API下载XML文件.
XML看起来像这样:

I'm downloading a XML file from an API with URLSessionDataTask.
The XML looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<ResultList id="12345678-0" platforms="A;B;C;D;E">
    <Book id="1111111111" author="Author A" title="Title A" price="9.95" ... />
    <Book id="1111111112" author="Author B" title="Title B" price="2.00" ... />
    <Book id="1111111113" author="Author C" title="Title C" price="5.00" ... />
    <ResultInfo bookcount="3" />
</ResultList>

有时XML可能有数千本书.
我正在使用Libxml2中的SAX解析器来解析XML.解析时,我创建一个对象Book并像这样设置XML中的值:

Sometimes the XML may have thousands of books.
I'm parsing the XML with the SAX parser from Libxml2. While parsing I create a object Book and set the values from the XML like so:

private func startElementSAX(_ ctx: UnsafeMutableRawPointer?, name: UnsafePointer<xmlChar>?, prefix: UnsafePointer<xmlChar>?, URI: UnsafePointer<xmlChar>?, nb_namespaces: CInt, namespaces: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?, nb_attributes: CInt, nb_defaulted: CInt, attributes: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?) {

    let elementName = String(cString: name!)

    switch elementName {
    case "Book":
        let book = buildBook(nb_attributes: nb_attributes, attributes: attributes)
        parser.delegate?.onBook(book: book)
    default:
        break
    }
}

func buildBook(nb_attributes: CInt, attributes: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?) -> Book {
    let fields = 5 /* (localname/prefix/URI/value/end) */
    let book = Book()
    for i in 0..<Int(nb_attributes) {
        if let localname = attributes?[i * fields + 0],
            //let prefix = attributes?[i * fields + 1],
            //let URI = attributes?[i * fields + 2],
            let value_start = attributes?[i * fields + 3]//,
            /*let value_end = attributes?[i * fields + 4]*/ {

                let localnameString = String(cString: localname)
                let string_start = String(cString: value_start)
                //let string_end = String(cString: value_end)

                if let end = string_start.characters.index(of: "\"") {
                    let value = string_start.substring(to: end)
                    book.setValue(value, forKey: localnameString)
                } else {
                    book.setValue(string_start, forKey: localnameString)
                }
        }
    }
    return book
}

在UITableViewController中,onBook(book: Book)委托方法将book对象附加到数组并更新UITableView.到目前为止一切顺利.

In the UITableViewController the onBook(book: Book) delegate method appends the book object to an array and updates the UITableView. So far so good.

现在的问题是,它占用了设备太多的RAM,因此我的设备变慢了. XML中有约500本书,占用的内存超过500 MB.我不知道为什么当我在Instruments中查找RAM时,我看到所有已分配的内存在_HeapBufferStorage<_StringBufferIVars, UInt16>

The problem now is, it takes too much RAM of the device and so my device becomes slow. With ~500 books in the XML it takes >500 MB of RAM. I don't know why. When I lookup the RAM in Instruments, I see all the allocated memory in the category _HeapBufferStorage<_StringBufferIVars, UInt16>

具有多个大于100 KB的条目

With multiple entries greater than 100 KB

在事件历史记录中列出了方法buildBook()

In the Event History is the method buildBook() listed

当我将Foundation的XMLParser与构造函数XMLParser(contentsOf: URL)一起使用时,该构造函数首先下载整个XML,然后对其进行解析,因此我具有正常的RAM使用率.不管有几本书.但是我想在UITableView中尽快显示这些书.我只想要类似Android的iOS的XMLPullParser之类的东西.

When I use the XMLParser from Foundation with the constructor XMLParser(contentsOf: URL) which first downloads the whole XML and then parses it, I have normal RAM usage. No matter how many books. But I want to show the books ASAP in the UITableView. I just want something like Android's XMLPullParser for iOS.

推荐答案

我正在使用libxml2(由于

I'm using libxml2 (due to this issue) and have code like this:

xmlParseChunk(ctxt, data, Int32(read), 0)

更改对此的调用会大大减少所消耗的内存量:

Changing the call to this reduces the amount of memory consumed considerably:

autoreleasepool {
    xmlParseChunk(ctxt, data, Int32(read), 0)
}

如果您使用上述的推式解析器调用,则可能会解决您的问题.如果没有,那么将您的委托调用包装在autoreleasepool调用中可能会有所帮助.

If you're using the push parser call like above this will likely fix your problem. If not then wrapping your delegate call in the autoreleasepool call may help.

原因是因为正在创建许多中间对象并将其添加到自动释放池中,而没有被释放.有关更多详细信息,请参见帖子.

The reason is because a lot of intermediate objects are being created and added to an autorelease pool and not being released. See this post for more details.

另一种方法是通过以其他方式更改代码来减少添加到自动释放池中的对象数量.例如,我发现我通过在可以避免的地方修剪空白来创建多余的字符串.

An alternative is to work to reduce the number of objects being added to the autorelease pool by changing your code in other ways. I found for example I was creating extra strings by trimming white space in places where I could avoid it.

此外,这与您的问题无关,但是属性的开头和结尾告诉您字符串的长度,您应该使用它.

Additionally, this is not related to your problem, but the start and the end of the attributes tell you the length of the string and you should be using that.

例如:

let valStart = UnsafeMutableRawPointer(mutating: attributes!
    .advanced(by: 3 + Int(i * 5)).pointee)
let valEnd = UnsafeMutableRawPointer(mutating: attributes!
    .advanced(by: 4 + Int(i * 5)).pointee)
let valData = Data(bytesNoCopy: valStart!, count: valEnd! - valStart!, 
    deallocator: .none)
let attrValue = String(data: valData, encoding: String.Encoding.utf8)

这篇关于使用Libxml2解析XML时占用大量RAM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆