你将如何最小化或压缩Core Data sqlite文件大小? [英] How would you minimize or compress Core Data sqlite file size?

查看:197
本文介绍了你将如何最小化或压缩Core Data sqlite文件大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个215MB的csv文件,我已解析并存储在包裹在我自己的自定义对象的核心数据。问题是我的核心数据sqlite文件是大约260MB。 csv文件包含我城市的公交系统(公共汽车站,时间,路线等)上的大约450万行数据。

I have a 215MB csv file which I have parsed and stored in core data wrapped in my own custom objects. The problem is my core data sqlite file is around 260MB. The csv file contains about 4.5million lines of data on my city's transit system (bus stop, times, routes etc).

我已经尝试修改属性,表示停止时间被存储为NSData文件,但由于某种原因,文件大小仍然保持在大约260MB。

I have tried modifying attributes so that arrays of strings representing stop times are stored instead as NSData files but for some reason the file size still remains at around 260MB.

我无法发送这个大小的应用程序。我怀疑任何人都想下载一个260MB的应用程序,即使它意味着他们有整个城市的交通时间表。

I can't ship an app this size. I doubt anyone would want to download a 260MB app even if it means they have the whole city's transit schedule on it.

有任何方法来压缩或最小化存储空间(即使它意味着不使用核心数据,我愿意听到建议)?

Are there any ways to compress or minimize the storage space used (even if it means not using core data, I am willing to hear suggestions)?

编辑:我只想提供更新现在因为我一直盯着文件大小难以置信。有一些聪明的操作涉及字符串,索引和数据库标准化一般来说,我已经设法将大小降低到6.5MB或2.6MB时压缩。存储在Core Data中的大约105,000个对象包含城市的公共交通系统的完整详细信息。我现在几乎在眼泪D':

I just want to provide an update right now because I have been staring at the file size in disbelief. With some clever manipulation involving strings, indexing and database normalization in general, I have managed to reduce the size down to 6.5MB or 2.6MB when compressed. About 105,000 objects stored in Core Data containing the full details of the city's transit system. I'm almost in tears right now D':

推荐答案

除非你的原始CSV是编码在一个真正愚蠢的方式,似乎不太可能的尺寸不会降到100M以下,无论你压缩多少。这对于一个应用程序仍然是巨大的。解决方案是将您的数据移动到Web服务。你可能想要下载和缓存重要的部分,但如果你谈论的是数百万条记录,那么从服务器获取最好。此外,我不得不相信,不时的交通系统改变,每次有一个停止调整时,必须升级一个10-of-MB的应用程序是令人沮丧的。

Unless your original CSV is encoded in a really foolish manner, it seems unlikely that the size is not going to get below 100M, no matter how much you compress it. That's still really large for an app. The solution is to move your data to a web service. You may want to download and cache significant parts, but if you're talking about millions of records, then fetching from a server seems best. Besides, I have to believe that from time to time the transit system changes, and it would be frustrating to have to upgrade a many-10s-of-MB app every time there was a single stop adjustment.

我说过,但实际上你可以考虑一些事情:

I've said that, but actually there are some things you may consider:


  • 将布尔值移入位域。你可以把64个布尔值放到一个NSUInteger中。 (如果你只需要8位,不要使用一个完整的64位整数。)。

  • 压缩存储时间。一天只有1440分钟。你可以存储在2个字节。运输时间一般不是秒;

  • 显然,您应该对任何字符串进行规范化。查看多行上的重复字符串值的CSV。

  • 我通常会推荐原始sqlite而不是核心数据。核心数据比原始数据存储更关注对象持久性。事实上,你看到的CSV(这本身不是高效率)20%的膨胀不是一个好的方向这个问题。

  • 如果你想要更紧凑,不需要非常好的搜索功能,可以创建打包数据blob。我曾经在电话交换机上这样做,内存非常紧。你创建一个位字段结构,为一个变量分配5位,为另一个变量分配7位等等。有了这一点,并且有些时间混乱的事情,使它们在字边界上正确排列,你可以非常紧张。

  • Move booleans into a bit fields. You can put 64 booleans into an NSUInteger. (And don't use a full 64-bit integer if you just need 8 bits. Store the smallest thing you can.)
  • Compress how you store times. There are only 1440 minutes in a day. You can store that in 2 bytes. Transit times are generally not to the second; they don't need a CGFloat.
  • Days of the week and dates can similarly be compressed.
  • Obviously you should normalize any strings. Look at the CSV for duplicated string values on many lines.
  • I generally would recommend raw sqlite rather than core data for this kind of problem. Core Data is more about object persistence than raw data storage. The fact that you're seeing a 20% bloat over CSV (which is not itself highly efficient) is not a good direction for this problem.
  • If you want to get even tighter, and don't need very good searching capabilities, you can create packed data blobs. I used to do this on phone switches where memory was extremely tight. You create a bit field struct and allocate 5 bits for one variable, and 7 bits for another, etc. With that, and some time shuffling things so they line up correctly on word boundaries, you can get pretty tight.

由于您最关心您的初始下载大小,并且可能愿意稍后扩展您的数据访问,你可以考虑非常特定领域的压缩。例如,在上面的讨论中,我提到了一段时间可以缩减到2个字节。在很多情况下,你可能会降低到1个字节,通过将时间存储为自上次以来的增量分钟(因为大多数时间将总是增加相当小的步骤,如果他们是公共汽车和火车时间表)。放弃数据库,您可以创建一个非常紧密的编码数据文件,您可以在首次启动时将其提取到数据库中。

Since you care most about your initial download size, and may be willing to expand your data later for faster access, you can consider very domain-specific compression. For example, in the above discussion, I mentioned how to get down to 2 bytes for a time. You could probably get down to 1 bytes in many cases by storing times as delta minutes since the last time (since most of your times are going to be always increasing by fairly small steps if they're bus and train schedules). Abandoning the database, you could create a very tightly encoded data file that you could extract into a database on first launch.

您还可以使用特定领域的知识字符串转换成更小的令牌。如果我编码的纽约地铁系统,我会注意到一些字符串出现了很多,如大道,道路,街道,东等。我可能编码为不可打印的ASCII像^ A,^ R,^ S,^ E等。我可能将138街编码为两个字节(0x8A13)。这当然是基于我的知识è(0x8a)从来没有出现在纽约地铁站。这不是一个一般的解决方案(在巴黎可能是一个问题),但它可以用于高度压缩您有特殊的知识的数据。在一个像华盛顿特区的城市,我相信他们最高的街道是第38街,然后有一个4价值的方向。所以你可以在两个字节编码,首先一个编号的街道令牌,然后一个位字段与2位为象限和6位的街道号码。这种想法可能会显着缩小您的数据大小。

You also can use domain-specific knowledge to encode your strings into smaller tokens. If I were encoding the NY subway system, I would notice that some strings show up a lot, like "Avenue", "Road", "Street", "East", etc. I'd probably encode those as unprintable ASCII like ^A, ^R, ^S, ^E, etc. I'd probably encode "138 Street" as two bytes (0x8A13). This of course is based on my knowledge that è (0x8a) never shows up in the NY subway stops. It's not a general solution (in Paris it might be a problem), but it can be used to highly compress data that you have special knowledge of. In a city like Washington DC, I believe their highest numbered street is 38th St, and then there's a 4-value direction. So you can encode that in two bytes, first a "numbered street" token, and then a bit field with 2 bits for the quadrant and 6 bits for the street number. This kind of thinking can potentially significantly shrink your data size.

这篇关于你将如何最小化或压缩Core Data sqlite文件大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆