Core导入大型数据集时的数据内存使用情况 [英] Core Data memory usage while importing large dataset

查看:144
本文介绍了Core导入大型数据集时的数据内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在卡住了大约两个星期的核心数据问题。我读了很多博客文章,文章和SO问题/答案,但我仍然无法解决我的问题。



我运行了大量的测试,并能够减少较大的问题到较小的问题。
这将是一个大的解释,所以保持与我!



问题 - datamodel



必须具有以下数据模型



对象A与对象B具有一对多关系,具有另一个一对多关系对象C.由于Core Data建议,我必须创建反向关系,所以每个B的实例指向它的父A,并且C指向它的父B。

  A<  - >> B  - > C 



问题 - MOC设置



保持响应平滑如黄油我创建了一个三级的managedObjectContext结构。


  1. 父MOC - 运行在自己的私有线程上使用 NSPrivateQueueConcurrencyType ,紧靠 persistentStoreCoordinator

  2. MainQueue MOC - 在mainThread上使用 NSMainQueueConcurrencyType 并且有父MOC 1

  3. 对于每个解析操作,我创建一个第三个MOC,它也有它的私有队列并且有parent mainQueue MOC

我的主数据控制器作为观察器添加到MOC 2的 NSManagedObjectContextDidSave 通知中每次MOC 2在MOC1上保存 performBlock:,执行保存操作(由于 performBlock:异步) 。



问题 - 解析



要将大型JSON文件解析到我的Core Data结构中,解析器。这个解析器从创建一个新的MOC(3)开始。然后它获取对象A的数据并解析其属性。然后解析器读出B的JSON关系并创建用数据填充的相应对象。这些新对象通过调用A上的 addBObject:被添加到A.
因为解析器是循环的,解析B意味着解析C,这里也创建新对象附加到B.
这一切发生在MOC 3上的 performBlock:中。




  • 解析(创建A对象并开始解析B)

    • 解析A(创建B对象,将其附加到A并开始解析C)

      • 解析B(创建C对象,将它们附加到B)

        • 解析C a C-object)



    / ul>

    每次解析操作后,我保存MOC 3并在mainThread上调度主MOC(2)的保存操作。因为 NSManagedObjectContextDidSave 通知MOC 1将异步自动保存。

      if parsed){
    NSError * error = nil;
    if(![managedObjectContext save:& error])
    NSLog(@保存解析数据时出错:%@,错误)
    } else {
    //出错了,丢弃更改
    [managedObjectContext reset];
    }

    dispatch_async(dispatch_get_main_queue(),^ {
    // save mainQueueManagedObjectContext
    [[HWOverallDataController sharedOverallDataController] saveMainThreadManagedObjectContext];
    });

    要释放我的内存占用,因为我现在不需要解析数据, / p>

      [a.managedObjectContext refreshObject:a mergeChanges:NO]; 


    $ b

    因为我需要解析大约10个A,它们都有大约10个B,它们都有大约10个C,所以生成了很多managedObject。



    问题 - 仪器



    一切正常。唯一的事情是:当我打开分配工具,我看到未发布的A,B和C的。我没有从他们的retainCounts或任何任何有用的信息。
    因为我的实际问题是一个更复杂的dataModel,活的对象成为一个严重的内存问题。
    有人能弄清我做错了什么吗?使用正确的managedObject调用其他managedObjectContexts上的refreshObject也不起作用。只有一个硬的重置似乎工作,但后来我松开我的指针使用UI使用的生活对象。



    其他解决方案我尝试了




    • 我尝试创建单向关系而不是双向关系。这创造了很多其他问题,导致核心数据不一致和奇怪的行为(如悬挂对象和核心数据生成1-n关系而不是nn关系(因为反向关系不知道)。


    • 当我在任何对象上检索 NSManagedObjectContextDidSave 通知时,我尝试刷新每个已更改或插入的对象

      b


    这两个'解决方案'(这不工作的方式)似乎也有点hacky。这不应该是去的方式。



    - CodeDemo



    p> http://cl.ly/133p073h2I0j



    - 进一步调查



    在mainContext(在mainSave之后)刷新每个使用过的对象(这是繁琐的工作)之后,它们的大小减少到48字节,这表示对象所有的错误,但是仍然有一个指针留在内存中。当我们有大约40.000个对象,它们都是错误的,仍然有1.920 MB的内存,永远不会释放,直到persistentManagedObjectContext被重置。这是我们不想做的,因为我们放弃了对任何managedObject的每一个引用。

    解决方案

    Robin,



    我有一个类似的问题,我解决不同于你有。在你的情况下,你有第三个,IMO,冗余MOC,父MOC。在我的情况下,我让两个MOC以旧式的方式通过持久存储协调器通过DidSave通知进行通信。新的面向块的API使得这更简单和健壮。这让我重置孩子MOC。虽然你从第三个MOC获得了性能优势,但它并不比我利用的SQLite行缓存有什么优势。您的路径占用更多内存。最后,我可以通过跟踪DidSave通知,修剪项目,因为他们创建。



    BTW,你也可能患有大幅增加你的 MALLOC_TINY MALLOC_SMALL VM区域。我的尾部调整算法允许分配器更快地重用空间,因此,延缓这些有问题的区域的增长。这些地区,在我的经验,由于他们大的常驻内存占用的主要原因我的应用程序,Retweever,被杀。



    当内存警告发生时,我调用下面的代码片段:



    <$> p $ p> [self.backgroundMOC performBlock:^ {[self.backgroundMOC reset]; }];

    [self.moc save];

    [self.moc.registeredObjects trimObjects];

    - [NSArray(DDGArray)trimObjects]



    总之,Core Data似乎为许多MOC中出现的项目实现了一个写入复制算法。因此,你有意想不到的方式保留。我专注于在导入后打破这些连接,以尽量减少我的内存占用。我的系统,由于SQLite行缓存,看起来表现良好。



    Andrew


    I'm now stuck for about two weeks with a nasty Core Data problem. I read lots of blogpost, articles and SO questions/answers but I'm still not able to solve my problem.

    I ran lots of tests and was able to reduce the larger problem to a smaller one. It's going to be a large explanation so keep with me!

    Problem - datamodel

    I have to got following datamodel:

    Object A has one-to-many relation with object B which has another one-to-many relation with object C. Because of Core Data recommendations I have to create inverse relations so each instance of B points to its parent A and the same for C which points to its parent B.

    A <->> B <->> C
    

    Problem - MOC setup

    To keep responsiveness smooth as butter I created a three-level managedObjectContext structure.

    1. Parent MOC - Runs on its own private thread using NSPrivateQueueConcurrencyType, is tight to the persistentStoreCoordinator
    2. MainQueue MOC - Runs on the mainThread using NSMainQueueConcurrencyType and has parent MOC 1
    3. For each parsing operation I create a third MOC which also has its private queue and has parent mainQueue MOC

    My main datacontroller is added as an observer to the NSManagedObjectContextDidSave notification of MOC 2 so every time MOC 2 saves a performBlock: on MOC1 is triggered which performs a save operation (asynchronously because of performBlock:).

    Problem - Parsing

    To perform parsing a large JSON file into my Core Data structure I wrote a recurrent parser. This parser starts by creating a new MOC (3). It then takes the data for object A and parses its properties. Then the parser reads out the JSON relations for B and create the corresponding objects which are filled with data. These new objects are added to A by calling addBObject: on A. Because the parser is recurrent, parsing B means parsing C and here also new objects are created and attached to B. This all happens in the performBlock: on MOC 3.

    • Parse (creates 'A'-objects and starts parsing B)
      • Parsing A (creates 'B'-objects, attaches them to A and starts parsing C)
        • Parsing B (creates 'C'-objects, attaches them to B)
          • Parsing C (just stores data in a C-object)

    After each parsing operation I save MOC 3 and dispatches on the mainThread a save operation of the main MOC (2). Because of the NSManagedObjectContextDidSave notification MOC 1 will autosave asynchronously.

            if (parsed){
                NSError *error = nil;
                if (![managedObjectContext save:&error])
                    NSLog(@"Error while saving parsed data: %@", error);
            }else{
                // something went wrong, discard changes
                [managedObjectContext reset];
            }
    
            dispatch_async(dispatch_get_main_queue(), ^{                
                // save mainQueueManagedObjectContext
                [[HWOverallDataController sharedOverallDataController] saveMainThreadManagedObjectContext];
            });
    

    To release my memory footprint and because I do not need to parsed data for now I am performing:

    [a.managedObjectContext refreshObject:a mergeChanges:NO];
    

    for each A I just parsed.

    Because I need to parse about 10 A's which all have about 10 B's which have all about 10 C's a lot of managedObject's are generated.

    Problem - Instruments

    Everything works fine. The only thing is: when I turn on the Allocations tool I see unreleased A's, B's and C's. I don't get any useful information from their retainCounts or whatsoever. And because my actual problem regards a more complex dataModel the living objects become a serious memory problem. Can someone figure out what I'm doing wrong? Calling refreshObjects on the other managedObjectContexts with the correct managedObject does not work either. Only a hard reset seems to work but then I loose my pointers to living objects used by the UI.

    Other solutions I tried

    • I tried creating unidirectional relations instead of bidirectional ones. This create a lot other problems which cause Core Data inconsistencies and weird behavior (such as dangling objects and Core Data generating 1-n relations instead of n-n relations (because the inverse relation is not known).

    • I tried refreshing each changed or inserted object when I retrieve a NSManagedObjectContextDidSave notification on any object

    These both 'solutions' (which don't work by the way) seems also a bit hacky. This should not be the way to go. There should be a way of getting this to work without raising the memory footprint and by keeping the UI smooth, though?

    - CodeDemo

    http://cl.ly/133p073h2I0j

    - Further Investigation

    After refreshing every object ever used (which is tedious work) in the mainContext (after a mainSave) the object their sizes are reduced to 48 bytes. This indicates that the objects are all faulted, but that there is still a pointer left in memory. When we have about 40.000 objects which are all faulted there is still 1.920 MB in memory which is never released until the persistentManagedObjectContext is reset. And this is something we don't want to do because we loose every reference to any managedObject.

    解决方案

    Robin,

    I have a similar problem which I solved differently than you have. In your case, you have a third, IMO, redundant MOC, the parent MOC. In my case, I let the two MOCs communicate, in an old school fashion, through the persistent store coordinator via the DidSave notifications. The new block oriented APIs make this much simpler and robust. This lets me reset the child MOCs. While you gain a performance advantage from your third MOC, it isn't that great of an advantage over the SQLite row cache which I exploit. Your path consumes more memory. Finally, I can, by tracking the DidSave notifications, trim items as they are created.

    BTW, you are also probably suffering from a massive increase in the size of your MALLOC_TINY and MALLOC_SMALL VM regions. My trailing trimming algorithm lets the allocators reuse space sooner and, hence, retards the growth of these problematic regions. These regions are, in my experience, due to their large resident memory footprint a major cause for my app, Retweever, being killed. I suspect your app suffers the same fate.

    When the memory warnings come, I call the below snippet:

    [self.backgroundMOC performBlock: ^{ [self.backgroundMOC reset]; }];
    
    [self.moc save];
    
    [self.moc.registeredObjects trimObjects];
    

    -[NSArray(DDGArray) trimObjects] just goes through an array and refreshes the object, thus trimming them.

    In summary, Core Data appears to implement a copy on write algorithm for items that appear in many MOCs. Hence, you have things retained in unexpected ways. I focus upon breaking these connections after import to minimize my memory footprint. My system, due to the SQLite row cache, appears to performa acceptably well.

    Andrew

    这篇关于Core导入大型数据集时的数据内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆