sqlite3 是否压缩数据? [英] Does sqlite3 compress data?

查看:21
本文介绍了sqlite3 是否压缩数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 7.4Gb 的 csv 文件.将其转换为带有 python 脚本 的 sqlite 数据库后,输出 DB 为 4.7Gb,大约 60%原始尺寸.

I've got an 7.4Gb csv file. After converting it to a sqlite database with a python script the output DB is 4.7Gb, around 60% of the original size.

csv 大约有 150,000,000 行.它有标题:

The csv has around 150,000,000 rows. It has header:

tkey,ipaddr,healthtime,numconnections,policystatus,activityflag

每一行看起来都像

261846,172.10.28.15,2012-02-03 16:15:00,22,1,1

脚本使用healthtime将数据拆分成192张表

The script uses healthtime to split the data into tables 192 tables

当我第一次看到这些数字时,我以为我在某个地方犯了错误.仅写入健康时间 192 次而不是 150,000,000 次的附加效率,我应该期望文件大小减少多少?

When I first saw these numbers, I assumed I had made an error someplace. How big of a reduction in file size should I expect from the added efficiency of only writing the healthtime 192 times instead of 150,000,000 times?

当我发布这篇文章时,我意识到了答案.我移除了大约 40% 的字符串,因此尺寸减少了 40%.

As soon as I posted this I realized the answer. I'm removing about 40% of the string, hence the 40% reduction in size.

编辑2让我们计算纯文本之间的大小差异:

Edit 2 Let's calculate the difference in size between the plain text:

"261846,172.10.28.15,2012-02-03 16:15:00,22,1,1" 

和数据库条目:

db(261846,'172.10.28.15',22,1,1)

首先,我们将纯文本表示从 46 个字符减少到 26 个字符.

First of all, we drop from 46 to 26 characters in plain text representation.

其余字符为:

"261846,172.10.28.15,22,1,1"

或 26 个字节.如果每个整数都需要以 32 位(4 个字节)存储,那么我们有:

or 26 bytes. If each integer needs to be stored in 32 bit (4 bytes), then we have:

12 字节 (ipaddr) + 4 字节 * 4(整数字段)= 28 字节.

12 bytes (ipaddr) + 4 bytes * 4 (integer fields) = 28 bytes.

所以看起来转换为整数会降低存储效率,而我所有的收益都来自减少每行存储的字符数.

So it looks like converting to integers makes the storage slightly less efficient, and all of my gains come form reducing the number of characters stored in each row.

推荐答案

SQLite 不运行压缩算法,但它会将数据存储在二进制文件而不是文本文件中.这意味着可以更有效地存储数据,例如使用 32 位(4 字节)数字来表示 10,000,000 而不是将其存储为 8 字节的文本(如果文件是 unicode,则更多)).

SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000 instead of storing it as 8 bytes of text (or more if the file is unicode).

如果您有兴趣,这里有关于 SQL 数据库文件格式的更多详细信息.

Here are more details on the SQL Database File Format if you are interested.

有意义吗?

这篇关于sqlite3 是否压缩数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆