1亿条记录需要多少内存 [英] How much memory do I need to have for 100 million records

查看:1742
本文介绍了1亿条记录需要多少内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要多少内存才能将1亿条记录加载到内存中。假设每个记录需要7个字节。这是我的计算

How much memory do i need to load 100 million records in to memory. Suppose each record needs 7 bytes. Here is my calculation

each record = <int> <short> <byte>
4  +  2  + 1 = 7 bytes

需要的内存GB = 7 * 100 * 1,000,000 / 1000,000,000 = 0.7 GB

此计算是否有问题?

推荐答案

具有100,000,000条记录,您需要考虑开销。确切的说,您将拥有的开销和开销取决于语言。

With 100,000,000 records, you need to allow for overhead. Exactly what and how much overhead you'll have will depend on the language.

例如,在C / C ++中,结构或类中的字段按特定的边界对齐。具体信息可能因编译器而异,但通常int的起始地址必须为4的倍数,short的起始地址应为2的倍数,char的起始位置可以是任意位置。

In C/C++, for example, fields in a structure or class are aligned onto specific boundaries. Details may vary depending on the compiler, but in general int's must begin at an address that is a multiple of 4, short's at a multiple of 2, char's can begin anywhere.

因此,假设您的4 + 2 + 1表示一个int,short和char,那么如果按该顺序排列它们,该结构将占用7个字节,但至少必须从结构的下一个实例开始在4字节边界处,因此中间将有1个填充字节。我认为,实际上,大多数C编译器都要求整个结构以8字节边界开始,尽管在这种情况下这并不重要。

So assuming that your 4+2+1 means an int, a short, and a char, then if you arrange them in that order, the structure will take 7 bytes, but at the very minimum the next instance of the structure must begin at a 4-byte boundary, so you'll have 1 pad byte in the middle. I think, in fact, most C compilers require structs as a whole to begin at an 8-byte boundary, though in this case that doesn't matter.

每次您分配内存时,分配块会有一些开销。编译器必须能够跟踪分配了多少内存以及下一个块有时在哪里。如果您将100,000,000条记录分配为一个大的 new或 malloc,则此开销应该很小。但是,如果您分别分配每个记录,那么每个记录都会有开销。究竟有多少取决于编译器,但是让我们来看一下,我使用的一个系统我认为每个分配为8字节。如果是这种情况,那么这里的每条记录需要16个字节:块标题8个字节,数据7个,填充1个。因此,它很容易花掉您期望的两倍。

Every time you allocate memory there's some overhead for allocation block. The compiler has to be able to keep track of how much memory was allocated and sometimes where the next block is. If you allocate 100,000,000 records as one big "new" or "malloc", then this overhead should be trivial. But if you allocate each one individually, then each record will have the overhead. Exactly how much that is depends on the compiler, but, let's see, one system I used I think it was 8 bytes per allocation. If that's the case, then here you'd need 16 bytes for each record: 8 bytes for block header, 7 for data, 1 for pad. So it could easily take double what you expect.

其他语言会有不同的开销。最简单的操作可能是凭经验找出:查找系统调用是查找正在使用的内存量,然后检查该值,分配一百万个实例,再次检查并查看差异。

Other languages will have different overhead. The easiest thing to do is probably to find out empirically: Look up what the system call is to find out how much memory you're using, then check this value, allocate a million instances, check it again and see the difference.

这篇关于1亿条记录需要多少内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆