磁盘上的数据库大小增加为CSV文件的倍数mongoimport? [英] Database size on disk increases as a multiple of the CSV file I mongoimport?

查看:215
本文介绍了磁盘上的数据库大小增加为CSV文件的倍数mongoimport?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我导入的CSV文件总大小为230M,文件大小为3069055行和13列。



我用来导入的命令是:



mongoimport -d taq -c mycollection --type csv --file myfile.csv --headerline



在我进行此导入之前,taq数据库为空。导入完成后(需要4分钟),我检查了mongodb用户目录中数据库文件的大小。这是我看到的:

  -rw ------- 1 mongod mongod 64M Jul 23 14:13 taq。 0 
-rw ------- 1 mongod mongod 128M Jul 23 14:10 taq.1
-rw ------- 1 mongod mongod 256M Jul 23 14:11 taq。 2
-rw ------- 1 mongod mongod 512M Jul 23 14:13 taq.3
-rw ------- 1 mongod mongod 1.0G Jul 23 14:13 taq .4
-rw ------- 1 mongod mongod 2.0G 7月23日14:13 taq.5
-rw ------- 1 mongod mongod 16M七月23 14:13 taq.ns

已创建六个taq文件,编号从0到5.这些文件的总大小文件是多个GB。为什么是这样,当我导入的CSV文件只有230M?这是一个错误?

解决方案

MongoDB商店数据完全不同的格式,称为BSON,这将占用更多的磁盘空间。不仅需要为每个字段存储值,它还必须在每个文档(行)中再次存储列名称。如果你有大的列名,这肯定可以增加MongoDB的大小是你的CSV文件的8到10倍。如果可能,您可以考虑缩短字段名称,如果这太多了。



MongoDB也可以预分配数据文件。例如,当它开始向 taq.2 中添加数据的时候,它将创建 taq.3 它开始写入 taq.4 它创建 tag.5 。所以在你的情况下,假设你的230MB文件将创建1.9GB的数据,MongoDB已经分配了2.0G大小 taq.5 。通过在启动 mongod 时在命令行中指定 - noprealloc ,可以关闭此行为。


I imported a CSV file which is 230M in total size, the dimensions of the file are 3069055 rows and 13 columns.

The command I used to import was:

mongoimport -d taq -c mycollection --type csv --file myfile.csv --headerline

Before I did this import the taq database was empty. After the import completed (which took 4 minutes), I checked the size of the database files in the mongodb user directory. This is what I see:

-rw------- 1 mongod mongod  64M Jul 23 14:13 taq.0  
-rw------- 1 mongod mongod 128M Jul 23 14:10 taq.1 
-rw------- 1 mongod mongod 256M Jul 23 14:11 taq.2
-rw------- 1 mongod mongod 512M Jul 23 14:13 taq.3 
-rw------- 1 mongod mongod 1.0G Jul 23 14:13 taq.4 
-rw------- 1 mongod mongod 2.0G Jul 23 14:13 taq.5
-rw------- 1 mongod mongod  16M Jul 23 14:13 taq.ns

Six taq files have been created, numbered from 0 to 5. The total size of these files is multiple GBs. Why is this, when the CSV file I imported is only 230M? Is this a bug? Or am I missing something?

Cheers.

解决方案

MongoDB stores data in a totally different format, called BSON, which is going to take up more disk space. Not only do the values need to be stored for each field, it also will have to store the column names again in each document (row). If you have large column names, this can definitely increase the size in MongoDB to be 8 to 10 times of your CSV file. If possible, you can look at shortening your field names if this is too much for you.

MongoDB also preallocates data files for you. For example, the moment it starts adding data to taq.2, it will create taq.3, and similarly when it starts writing into taq.4 it creates tag.5. So in your case, say your 230MB file would create 1.9GB of data, MongoDB has already allocated the 2.0G sized taq.5. This behaviour can be turned off by specifying --noprealloc on the command line when starting mongod.

这篇关于磁盘上的数据库大小增加为CSV文件的倍数mongoimport?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆