CoreData和RestKit性能,同时导入非常大的数据集 [英] CoreData and RestKit performance while importing very large datasets

查看:75
本文介绍了CoreData和RestKit性能,同时导入非常大的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用RestKit在各种端点(在iOS平台上)获取JSON数据。

I am using RestKit for fetching JSON data on various endpoints (on iOS platform).

有几个问题,SO指向相同的方向:

There are several questions on SO which point to the same direction like that one:

汇入大使用CoreData的iPhone上的数据集

但是我的问题仍然是不同的,因为我知道,如果JSON文件太大,进入块。我将这样做!

But my question is still a different one, because I know, if the JSON file gets too large, I have to cut it into chunks. I'll do that!

在RestKit中使用CoreData进行导入的确切程度。

看来有父/子上下文设置,当在最短可能的时间内导入大型数据集 时,这 非常低效 strong>)。

How exactly is the importing done with CoreData in RestKit.
Seems that there is a parent/child contexts setup, which is very inefficient when importing large datasets in the shortest possible amount of time (maybe all at once at launch - no batch/lazy importing!!!).

查看Florian Kugler在CoreData(Stacks)中执行导入操作的帖子

我的问题是:除了已经使用 RestKit 设置的父/子上下文,并运行 RKManagedObjectRequestOperation 导入完全异步,在其他上下文。然后将上下文合并到mainContext以获取...

My question is: Can I setup a different context, apart from the parent/child contexts setup already with RestKit and run a RKManagedObjectRequestOperation importing completely async and on the other context. Then merge the context to the mainContext for fetching...

我真的想坚持使用CoreData而不是切换到纯SQLite,从组合中获得最可能的性能 CoreData RestKit

I really want to stick with CoreData instead of switching to plain SQLite, getting the most possible performance out of the combination CoreData and RestKit.

专业答案。也许布莱克也可以直接回答这个问题。

I am thrilled about your professional answers. Maybe Blake could directly answer me this question, too.

推荐答案

好吧,首先,如果你想要最大的性能,真的需要,不要使用RestKit,不要使用AFNetworking,不要使用 NSJSONSerialization 。他们都受到设计选择的困扰,这些设计在处理大型数据集时效果不佳,而且如果您的目标是保持适度的内存足迹和

Well, first off, if you want maximum performance, and if you really need that, don't use RestKit, don't use AFNetworking and don't use NSJSONSerialization. They all suffer from design choices which do not play well when dealing with large data sets and if your goal is maintaining a moderately low memory foot print and high performance.

您应该有一个非常大的单个JSON(可能是一个JSON数组,其元素是JSON对象)作为单体的主体连接以获得卓越的性能。或者,您可以使用自定义传输格式,在一个连接中发送多个JSON(例如,一系列JSON对象,以空格分隔)。

You should have a very large single JSON (likely a JSON Array whose elements are JSON Objects) as the body of a single connection to get superior performance. Alternatively, you can have a custom transport format which sends multiple JSONs within one connection (say, a series of JSON Objects, separated by a "white space").

有大量的连接肯定很慢。

Having a large number of connections is definitely slow.

当你努力达到最快的性能时,你应该同时 下载,解析JSON,创建表示并将其保存到持久存储。

When you strive to achieve the fastest performance, you should simultaneously download, parse the JSON, create the representation and save it to the persistent store.

注意:

当并行执行所有这些操作时,您尤其容易受到连接错误的影响,并保持一致和逻辑正确的数据集可能会成为一个挑战。因此,如果您的连接遇到质量不佳和频繁中断,您可以先下载并将JSON文件保存到临时文件(也支持HTTP范围标头,以便暂停和恢复下载)。当然,你的表现会降低 - 但在这种情况下,你不能使它快。

When doing this all in parallel, you are especially vulnerable against connection errors and keeping a consistent and logical correct data set may become a challenge. Thus, if your connection suffers from bad quality and frequent interruptions, you may first download and save the JSON file to a temporary file (also supporting HTTP range headers for the opportunity to suspend and resume a download). Sure, your performance decreases - but under this conditions, you can't make it fast anyway.

再次,当你的目标是最大的性能,你应该利用所有CPU的能力,它是并行运行,因为它是有意义的 - 这是特别是当连接是快速的情况。

So again, when your goal is maximum performance, you should utilize all the CPUs capabilities, that is run as much in parallel as it makes sense - and this is especially the case when the connection is fast.

JSON解析器还应该能够解析包含在 NSData 中的chunks对象,因为这是我们从连接:didReceiveData:

The JSON parser should also be able to parse "chunks" - that is partial JSON - contained in a NSData object, since this is what we get from connection:didReceiveData:.

获得的数据。需要映射到合适的表示。通常,已知的JSON解析器创建基础表示。但是,更快的方法是从JSON直接创建最终所需的对象。这需要一个SAX风格的API - 这基本上是一个解析器的简化版本,它向代理或客户端发送解析事件 - 例如gets JSON-Array begin或got JSON Boolean False等,以及

When you receive the JSON data, you need to "map" this into a suitable representation. Usually, the known JSON parsers create a "Foundation representation". However, the faster approach is to directly create the eventual desired kind of objects from the JSON. This requires a "SAX style API" - that is basically a simplified version of a parser which sends "parse events" to a delegate or client - for example "got JSON-Array begin" or "got JSON Boolean False", etc. and custom code that receives these events and constructs the desired object on the fly.

这一切都需要一个JSON解析器,你不能在中找到它的功能。 NSJSONSerialization :一个SAX风格的API,chunk解析或解析输入,这是一系列JSON文档。

This all requires a JSON parser having features you won't find in NSJSONSerialization: a SAX-Style API, "chunk parsing", or parsing input which is a series of JSON documents.

CPU,磁盘和网络的利用,您将您的任务划分为CPU限制,I / O绑定和网络绑定操作,并创建多个并行运行,因为它对系统是相同的。这些任务基本上全部异步运行,接受输入,处理输入,并产生作为下一个异步任务的输入的输出。第一个任务在完成后通知下一个任务,例如通过完成处理程序(块),并通过参数传递其输出。

In order to maximize the utilization of CPU, disk and network you divide your "tasks" into CPU-bound, I/O-bound respectively network-bound operations and create as many and run as many in parallel as it is sane to the system. These tasks basically run all asynchronously, take an input, process the input, and produce an output which is the input of the next asynchronous task. The first tasks notifies the next task when it is finished, for example via completion handlers (blocks), and passes its output via parameters.

处理JSON的传入chunks数据,即解析和创建表示,是一个CPU限制操作。这通常是相当快的,但我不认为通过并发队列在所有可用的CPU上分派这些CPU限制任务是值得的。

Processing incoming "chunks" of JSON data, that is parsing and creating the representation , is a CPU-bound operation. This is usually quite fast however, and I don't think that it is worth the effort to dispatch these CPU-bound tasks on all available CPUs by means of a concurrent queue.

处理JSON数据的传入chunks可以基本上通过两种方法实现:

Processing incoming "chunks" of JSON data can be implemented in basically two approaches, again with pros and cons:

当你在连接:didReceiveData:中得到一个chunk时,你可以将它异步调度到不同的队列进行处理创建表示)在与委托不同的线程上运行。

When you get a "chunk" in connection:didReceiveData: you can asynchronously schedule this onto a different queue for processing (that is parsing and creating the representation) running on a different thread than the delegate.

优点:委托立即返回,从而不阻塞委托线程,这反过来导致最快的读取传入网络数据和适度小的网络缓冲区。

Pros: the delegate immediately returns, thereby NOT blocking the delegate thread, which in turn results in fastest reading of incoming network data and moderately small network buffers. The connection is finished in the shortest possible duration.

缺点:如果处理比接收数据慢,您可以将大量的 NSData 在等待在串行调度队列中执行的块中的对象。这将为每个 NSData 对象保留分配的内存 - 并且系统RAM最终可能会耗尽,您可能会收到内存警告或崩溃,除非您采取适当的操作。

Cons: if processing is slow compared to receiving the data, you may queue a large number of NSData objects in the block waiting to be executed in a serial dispatch queue. That will keep the allocated memory for each NSData object - and system RAM may eventually become exhausted and you may get memory warnings or crashes unless you take appropriate actions.

当接收到一个JSON块时,将同步调用解析器相对于代理的线程。

When receiving a chunk of the JSON, the parser will be invoked synchronously with respect to the delegate's thread.

优点:$ b​​ $ b与接收数据相比,当数据处理速度较慢时,避免了内存问题。然而,这可能最终停止从网络读取数据(一旦内部接收缓冲器已满)。

Pros: This avoids the memory issue when the data processing is slow compared to receiving the data. However, this may eventually stall the reading of data from the network (once the internal receive-buffer is full).

缺点:$ b​​ $ b如果处理缓慢,内部网络缓冲区已满,这将增加连接活动的时间,从而增加连接的可能性将被断开。

Cons: If the processing is slow and internal network buffers became full, this will increase the time the connection is active and thus increases the probability that the connection will be disconnected.

两种方法都受益于快速解析器/表示生成器,并且需要一个解析器来处理JSON的chunks作为 NSData 对象,并在完成表示时异步通知客户端。或者,它还应具有SAX样式API。有两个第三方JSON解析器,我知道满足这些要求:

Both approaches benefit from a fast parser/representation-generator, and require a parser which can process "chunks" of JSON as a NSData object and asynchronously notifies a client when it is finished with the representation. Optionally, it should also have a "SAX style" API. There are two third party JSON parsers which I know of fulfill these requirements:

jsonlite 和此

JPJson

两者都非常快(比JSONKit和NSJSONSerialization更快),支持SAX风格的解析,并且可以处理大块作为 NSData 对象的JSON。 JPJson还可以处理包含多个JSON的文件。

Both are very fast (faster than JSONKit and NSJSONSerialization), support SAX style parsing and can process JSON in chunks as NSData objects. JPJson additionally can process a file containing multiple JSONs.

(披露:我是JPJson的作者)

(Disclosure: I'm the author of JPJson)

创建表示时,下一步是创建和初始化托管对象(除非解析器直接生成管理对象)并将对象保存到持久存储中。这是一个I / O和CPU限制操作 - 但是当使用SSD存储时,可能会有更多的CPU限制。我将这个进程安排在一个单独的队列,并检查如何与其他CPU绑定操作一起工作。根据网络的速度,网络变得更多的CPU绑定与更高的带宽。

When a representation is created, the next step is to create and initialize the managed object (unless the parser generates directly manage objects) and save the object into the persistent store. This is an I/O and CPU bound operation - but likely more CPU bound when SSD storages are used. I would schedule this process onto a separate queue, and examine how this works in conjunction with the other CPU bound operations. Depending on the speed of the network, the network becomes more CPU bound with higher bandwidth.

一种可扩展的方法,考虑到不良和良好的连接,努力维持低内存占用和最大化性能,是相当难以实现,编程任务。玩的开心! ;)

A scalable approach which takes bad and good connections into account, strives to maintain low memory foot-print and maximizes performance, is quite difficult to achieve, though - and a challenging programming task. Have fun! ;)

这篇关于CoreData和RestKit性能,同时导入非常大的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆