如何在 Core Data 中高效插入和获取 UUID [英] How to efficient insert and fetch UUID in Core Data

查看:29
本文介绍了如何在 Core Data 中高效插入和获取 UUID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种在 Core Data 中存储和搜索 UUID 的有效方法.这些 UUID 是由分布式系统中的许多 iOS 设备生成的.这些设备中的每一个都可以存储大约 20-50k 个 UUID.

I am looking for an efficient way to store and search UUID in Core Data. Those UUID are generated by many iOS devices in a distributed system. Each of those devices may store about 20-50k UUIDs.

很明显,在 Core Data 中将 UUID 存储为 String 会损害对其进行索引的效率.但经过一系列研究后,我发现在 Core Data 中将 UUID 存储为二进制数据(并对其进行索引)可能比将其存储为字符串效率低.

It is obvious that storing UUID as String in Core Data will hurt the efficiency of indexing on it. But after a series of research I found that storing UUID as Binary Data in Core Data (and index it) may be less efficient than storing it as String.

因为在 SQLit 中不支持 BINARY-like 或 VARBINARY-like 数据类型.我猜想 Core Data 中任何二进制数据类型的数据都存储为 SQLit 中的 BLOB.由于BLOB可能是索引最慢的数据类型,因此会对性能造成不良影响.

As there is no BINARY-like or VARBINARY-like data type in SQLit is supported. I guess that any Binary Data type of data in Core Data is stored as BLOB in SQLit. Since BLOB could be slowest data type to be indexed, it will cause bad influence on the performance.

那么谁能帮忙回答一下,有没有更有效的方法将 UUID 存储在 Core Data 中?

So can anyone help to answer, is there a more efficient way to store UUID in Core Data?

推荐答案

将它们存储为 ASCII 字符串,并使该字段成为索引.

Store them as a ASCII string, and make the field an index.

编辑

Egads,我碰巧正在做一些探索,并遇到了这个.多么可耻的回答.那天我一定是心情不好.如果可以,我会删除它并继续前进.但是,这是不可能的,所以我将提供更新的片段.

Egads, I happened to be doing some poking about, and came across this. What a shameful answer. I must have been in a bit of a mood that day. If I could, I'd just delete it and move on. However, that's not possible, so I'll provide a snip of an update.

首先,知道什么是高效"的唯一方法是衡量,考虑程序时间和空间以及源代码复杂性和程序员的工作量.

First, the only way to know what is "efficient" is to measure, considering program time and space as well as source code complexity and programmer effort.

幸运的是,这很容易.

我写了一个非常简单的 OSX 应用程序.该模型由单个属性组成:identifier.

I wrote a very simple OSX application. The model consists of a single attribute: identifier.

这些都不重要,如果您不将您的属性标记为索引.创建商店会花费更多的时间,但它会使查询速度更快.

None of this matters, if you do not mark your attribute as an index. It will take a whole lot more time when creating the store, but it will make queries much faster.

另外,请注意为二进制属性创建谓词与为字符串创建谓词完全相同:

Also, note that creating a predicate for a binary attribute is exactly the same as creating one for a string:

fetchRequest.predicate =
    [NSPredicate predicateWithFormat:@"identifier == %@", identifier];

应用程序非常简单.首先,它创建 N 个对象,并为标识符属性分配一个 UUID.它每 500 个对象保存一次 MOC.然后我们将所有标识符存储到一个数组中并随机打乱它们.然后将整个 CD 堆栈完全拆除,将其全部从内存中删除.

The application is very simple. First, it creates N objects, and assigns a UUID to the identifier attribute. It saves the MOC every 500 objects. We then store all identifiers into an array and randomly shuffle them. The whole CD stack is then torn down completely to remove it all from memory.

接下来,我们再次构建堆栈,然后迭代标识符,并进行简单的获取.fetch 对象被构造,使用一个简单的谓词来获取那个对象.所有这些都是在自动释放池中完成的,以尽可能保持每次获取的原始状态(我承认将与 CD 缓存进行一些交互).这并不重要,因为我们只是在比较不同的技术.

Next, we build the stack again, and then iterate over the identifiers, and do a simple fetch. The fetch object is constructed, with a simple predicate to fetch that one object. All of this is done inside an autoreleasepool to keep each fetch as pristine as possible (I acknowledge that there will be some interaction with the CD caches). That's not so important, as we are just comparing the different techniques.

二进制标识符是 UUID 的 16 字节.

Binary identifier is the 16-bytes for the UUID.

UUID String是一个36字节的字符串,调用[uuid UUIDString]的结果,长这样(B85E91F3-4A0A-4ABB-A049-83B2A8E6085E)

UUID String is a 36-byte string, the result of calling [uuid UUIDString], and it looks like this (B85E91F3-4A0A-4ABB-A049-83B2A8E6085E).

Base64 String 是一个 24 字节的字符串,是 base-64 编码 16 字节 UUID 二进制数据的结果,对于相同的 UUID,它看起来像这样(uF6R80oKSrugSYOyqOYIXg==).

Base64 String is a 24-byte string, the result of base-64 encoding the 16-byte UUID binary data, and it looks like this (uF6R80oKSrugSYOyqOYIXg==) for the same UUID.

Count 是该运行的对象数.

Count is the number of objects for that run.

SQLite 大小是实际 SQLite 文件的大小.

SQLite size is the size of the actual sqlite file.

WAL 大小是 WAL(预写日志)文件的大小 - 仅供参考...

WAL size is how big the WAL (write-ahead-logging) file gets - just FYI...

Create 是创建数据库的秒数,包括保存.

Create is the number of seconds to create the database, including saving.

Query 是查询每个对象的秒数.

Query is the number of seconds to query each object.

Data Type     | Count (N) | SQLite Size | WAL Size  | Create  | Query
--------------+-----------+-------------+-----------+---------+---------
Binary        |   100,000 |   5,758,976 | 5,055,272 |  2.6013 |  9.2669
Binary        | 1,000,000 |  58,003,456 | 4,783,352 | 59.0179 | 96.1862
UUID String   |   100,000 |  10,481,664 | 4,148,872 |  3.6233 |  9.9160
UUID String   | 1,000,000 | 104,947,712 | 5,792,752 | 68.5746 | 93.7264
Base64 String |   100,000 |   7,741,440 | 5,603,232 |  3.0207 |  9.2446
Base64 String | 1,000,000 |  77,848,576 | 4,931,672 | 63.4510 | 94.5147

这里首先要注意的是,实际的数据库大小远大于存储的字节数(1,600,000 和 16,000,000)——这是数据库的预期.额外的存储量将在某种程度上与您的实际对象的大小相关...这个仅存储标识符,因此开销百分比会更高).

The first thing to note here is that the actual database size is much larger than the bytes stored (1,600,000 and 16,000,000) - which is to be expected for a database. The amount of extra storage will be somewhat relative to the size of your actual objects... this one only stores the identifier so the percentage of overhead will be higher).

第二,关于速度问题,作为参考,执行相同的 1,000,000 个对象查询,但在提取中使用对象 ID 大约需要 82 秒(注意这与调用 existingObjectWithID:error: 花费了惊人的 0.3065 秒).

Second, on the speed issues, for reference, doing the same 1,000,000 object query, but using the object-id in the fetch took about 82 seconds (note the stark difference between that and calling existingObjectWithID:error: which took a whopping 0.3065 seconds).

您应该分析自己的数据库,包括在运行代码中明智地使用工具.我想如果我进行多次运行,数字会有所不同,但它们非常接近,因此没有必要进行此分析.

You should profile your own database, including a judicious use of instruments on the running code. I imagine the numbers would be somewhat different if I did multiple runs, but they are so close that it's not necessary for this analysis.

但是,基于这些数字,让我们看看代码执行的效率测量.

However, based on these numbers, let's look at efficiency measurements for the code execution.

  • 正如预期的那样,存储原始 UUID 二进制数据在空间方面更有效.
  • 创建时间非常接近(差异似乎取决于创建字符串的时间和所需的额外存储空间).
  • 查询时间看起来几乎相同,二进制字符串似乎有点慢.我认为这是最初的关注点——对二进制属性进行查询.

二进制占了很多空间,可以认为它在创建时间和查询时间上都非常接近.如果我们只考虑这些,存储二进制数据显然是赢家.

Binary wins space by a lot, and it can be considered a close draw on both creation time and query time. If we just consider those, storing the binary data is the clear winner.

源代码复杂度和程序员时间如何?

How about source code complexity and programmer time?

好吧,如果您使用的是现代版本的 iOS 和 OSX,几乎没有区别,尤其是 NSUUID 上的简单类别.

Well, if you are using a modern version of iOS and OSX, there is virtually no difference, especially with a simple category on NSUUID.

但是,您有一个考虑因素,那就是易于使用数据库中的数据.当您存储二进制数据时,很难对数据进行良好的可视化.

However, there is one consideration for you, and that's ease of using the data in the database. When you store binary data, it's hard to get a good visual on the data.

因此,如果出于某种原因,您希望以对人类更有效的方式存储数据库中的数据,那么将其存储为字符串是更好的选择.因此,您可能需要考虑使用 base64 编码(或其他一些编码——但请记住,它已经采用 base-256 编码).

So, if, for some reason, you want the data in the database to be stored in a more efficient manner for humans, then storing it as a string is a better choice. So, you may want to consider a base64 encoding (or some other encoding -- though remember it's already in base-256-encoding).

FWIW,这是一个示例类别,可以更轻松地访问作为 NSData 和 base64 字符串的 UUID:

FWIW, here's an example category to provide easier access to the UUID as both NSData and base64 string:

- (NSData*)data
{
    uuid_t rawuuid;
    [self getUUIDBytes:rawuuid];
    return [NSData dataWithBytes:rawuuid length:sizeof(rawuuid)];
}

- (NSString*)base64String
{
    uuid_t rawuuid;
    [self getUUIDBytes:rawuuid];
    NSData *data = [NSData dataWithBytesNoCopy:rawuuid length:sizeof(rawuuid) freeWhenDone:NO];
    return [data base64EncodedStringWithOptions:0];
}

- (instancetype)initWithBase64String:(NSString*)string
{
    NSData *data = [[NSData alloc] initWithBase64EncodedString:string options:0];
    if (data.length == sizeof(uuid_t)) {
        return [self initWithUUIDBytes:data.bytes];
    }
    return self = nil;
}

- (instancetype)initWithString:(NSString *)string
{
    if ((self = [self initWithUUIDString:string]) == nil) {
        self = [self initWithBase64String:string];
    }
    return self;
}

这篇关于如何在 Core Data 中高效插入和获取 UUID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆