ArangoDB 的内存使用 [英] Memory usage of ArangoDB

查看:55
本文介绍了ArangoDB 的内存使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解 Arangodb 的限制是什么以及理想的设置是什么.据我了解,arango 将所有收集数据存储在虚拟内存中,理想情况下,您希望它适合 RAM.如果集合增长并且无法放入 RAM 中,它将被交换到磁盘.

I am trying to understand what the limits of Arangodb are and what the ideal setup is. From what I have understood arango stores all the collection data in the virtual memory and ideally you want this to fit in the RAM. If the collection grows and cannot fit in the RAM it will be swapped to disk.

所以我的第一个问题.如果我的数据库增长,我是否需要调整交换分区/文件以适应数据库?

So my first question. If my db grows will I need to adjust the swap partition/file to accommodate the db?

既然 arango 也将数据同步到磁盘,这是否意味着数据将始终位于 RAM 和磁盘中?因此,如果我有一个 1.5GB 的数据库并且我的 RAM 是 1GB,我至少需要有 0.5GB 的交换磁盘和 1.5GB 的常规磁盘空间?

Since arango also syncs the data to disk does this mean that the data will always be located in the RAM and disk? So if I have a db that's 1.5GB and my RAM is 1GB I will need to at least have 0.5GB of swap disk and 1.5GB of regular disk space?

我有点困惑 arango 如何使用虚拟内存.现在我有 7 个几乎是空的集合.我有 1GB 的 RAM 和 1GB 的交换磁盘.管理员报告说 arango 正在使用 4.5GB 的虚拟内存.如果交换磁盘是 1GB,这怎么可能?它目前使用 80MB 的 RAM.如果每个馆藏的期刊大小为 32MB,这不应该是 224MB 吗?

I am a bit confused how arango uses the virtual memory. Right now I have 7 collections that are practically empty. I have 1GB of RAM and 1GB of swap disk. The admin reports that arango is using 4.5GB of virtual memory. How is this possible if the swap disk is 1GB? It's currently using 80MB of RAM. Shouldn't this be 224MB if the journal size is 32MB for each collection?

期刊大小与馆藏大小的建议是什么?这可以随着集合的增长动态调整吗?

What is the recommendation on the journal size vs collection size? Can this be dynamically adjusted as the collection grows?

如果磁盘是SSD的话,如果交换磁盘使用量大,能达到什么样的性能?如果大量使用交换磁盘,性能是否与使用更传统的数据库(例如 mysql)相似?

What kind of performance is expected if the swap disk is used a lot when the disk is an SSD? If the swap disk is used a lot would the performance be similar to using a more traditional db such as mysql?

推荐答案

ArangoDB 将所有数据存储在内存映射文件中.每个集合可以有 0 到 n 个数据文件,每个文件的默认文件大小为 32 MB(请注意,此文件大小可以全局调整或在每个集合级别进行调整).空集合(从未有任何数据)将没有数据文件.对集合的第一次写入将创建数据文件,每当数据文件已满时,将自动创建一个新文件.

ArangoDB stores all data in memory-mapped files. Each collection can have 0 to n datafiles, with a default filesize of 32 MB each (note that this filesize can be adjusted globally or on a per-collection level). An empty collection (that never had any data) will not have a datafile. The first write to a collection will create the datafile, and whenever a datafile is full, a new one will be created automatically.

默认情况下,集合以 32 MB 的块分配数据文件.如果您确实有很多但很小的集合,这可能会浪费一些内存.如果您的集合很少但很大,那么潜在的浪费(数据文件末尾的可用空间)可能无关紧要.

Collections allocate datafiles in chunks of 32 MB by default. If you do have many but small collections this might waste some memory. If you many few but big collections, the potential waste (free space at the end of a datafile) probably doesn't matter too much.

每当任何 ArangoDB 操作从内存映射数据文件读取数据或将数据写入内存映射数据文件时,操作系统都会首先将文件中的偏移量转换为页码.这是因为每个数据文件都被隐式拆分为特定大小的页面.页面有多大取决于平台,但我们假设页面大小为 4 KB.因此,具有默认文件大小的数据文件将有 8192 页.

Whenever any ArangoDB operation reads data from or writes data to a memory-mapped datafile, the operating system will first translate the offset into the file into a page number. This is because each datafile is implicitly split into pages of a specific size. How big a page is is platform-dependent, but let's assume pages are 4 KB in size. So a datafile with a default filesize will have 8192 pages.

操作系统将文件中的偏移量转换为页码后,将确保请求页面的数据存在于物理 RAM 中.如果页面尚未在物理 RAM 中,操作系统将发出页面错误以触发从磁盘加载请求的页面或交换到物理 RAM.这最终将使整个页面在 RAM 中可用,并且在此之后可能会发生对页面数据的任何读取或写入.

After the OS has translated the offset into the file into a page number, it will make sure the data of requested page are present in physical RAM. If the page is not yet in physical RAM, the operating system will issue a page fault to trigger loading of the requested page from disk or swap into physical RAM. This will eventually make the complete page available in RAM, and any reads or writes to the page's data may occur after that.

所有这些都是由操作系统的虚拟内存管理器完成的.操作系统可以自由地将尽可能多的页面从数据文件映射到 RAM 中,因为它认为是好的.例如,当按顺序访问内存映射文件时,操作系统可能会很聪明并预读许多页面,因此在实际访问时它们已经在物理 RAM 中.

All of this is done by the operating system's virtual memory manager. The operating system is free to map as many pages from a datafile into RAM as it thinks is good. For example, when a memory-mapped file is accessed sequentially, the operating system will likely be clever and read-ahead many pages, so they are already in physical RAM when actually accessed.

操作系统也可以自由地换出数据文件的部分或全部页面.如果没有足够的物理 RAM 来将所有页面同时保留在 RAM 中,则它可能会换出页面.它还可能会换出一段时间未使用的页面,以使 RAM 可用于其他操作.它可能会为此使用一些 LRU 算法.

The OS is also free to swap out some or all pages of a datafile. It will likely swap out pages if there is not enough physical RAM available to keep all pages from all datafiles in RAM at the same time. It may also swap out pages that haven't been used for a while, to make RAM available for other operations. It will likely use some LRU algorithm for this.

操作系统的虚拟内存管理器的行为方式在平台和实现之间完全不同.大多数系统还允许配置 VM 子系统.例如,这里是 Linux 的 VM 子系统的一些参数.

How the virtual memory manager of an OS behaves exactly is wildly different across platforms and implementations. Most systems also allow configuring the VM subsystem. For example, here are some parameters for Linux's VM subsystem.

因此,很难确定 ArangoDB 实际使用多少物理内存用于给定数量的集合及其数据文件.如果根本没有访问集合,则对数据文件进行内存映射可能几乎不使用 RAM,因为操作系统可能已完全或至少部分地交换了集合.如果集合被大量使用,操作系统可能会将它们的数据文件完全映射到 RAM 中.但在这两种情况下,内存都算作内存映射.这意味着您可以使用比物理 RAM 高得多的虚拟内存使用量.

It is therefore hard to tell how much physical memory ArangoDB will actually use for a given number of collection and their datafiles. If the collections aren't accessed at all, having the datafiles memory-mapped might use almost no RAM as the OS has probably swapped the collections out fully or at least partially. If the collections are heavily in use, the OS will likely have their datafiles fully mapped into RAM. But in both cases the memory counts as memory-mapped. This is you can have a much higher virtual memory usage than you have physical RAM.

如前所述,操作系统在访问不在 RAM 中的页面时必须做很多工作,如果可能,您希望避免这种情况.如果您经常使用的集合的总大小超过物理 RAM 的大小,则操作系统别无选择,只能在您访问这些集合时大量交换页面.使用 SSD 进行交换可能比使用旋转 HDD 更好,但仍然比 RAM 访问慢得多.长话短说:如果可能,您的活动集合(数据文件和索引)的数据应该适合物理 RAM,否则您将看到大量磁盘活动.

As mentioned before, the OS has to do a lot of work when accessing pages that are not in RAM, and you want to avoid this if possible. If the total size of your frequently used collections exceeds the size of the physical RAM, the OS has no alternative but to swap pages out and in a lot when you access these collections. Using an SSD for the swap will likely be better than using a spinning HDD, but is still far slower than RAM access. Long story short: the data of your active collections (datafiles plus indexes) should fit into physical RAM if possible, or you will see a lot of disk activity.

除此之外,ArangoDB 不仅为集合数据文件分配虚拟内存,而且还启动了一些也使用虚拟内存的 V8 线程(V8 是 ArangoDB 中的 JavaScript 引擎).此虚拟内存不是文件支持的.

Apart from that, ArangoDB does not only allocate virtual memory for the collection datafiles, but it also starts a few V8 threads (V8 is the JavaScript engine in ArangoDB) that also use virtual memory. This virtual memory is not file-backed.

在空的 ArangoDB 中,V8 占了大部分虚拟内存使用量.例如,在我的 64 位计算机上,V8 线程消耗大约 5 GB 的虚拟内存(但 ArangoDB 总共只使用 140 MB RAM),而在我的 RAM 较少的 32 位计算机上,V8 线程使用大约 600 - 700 MB虚拟内存.在您的情况下,使用 4.5 GB 虚拟机,我怀疑 V8 也是原因.

In an empty ArangoDB V8 accounts for most of the virtual memory usage. For example, on my 64 bit computer, the V8 threads consume about 5 GB of virtual memory (but ArangoDB in total only uses 140 MB RAM), whereas on my 32 bit computer with less RAM, the V8 threads use about 600 - 700 MB virtual memory. In your case, with the 4.5 GB VM usage, I suspect V8 is the reason, too.

V8 线程的虚拟内存使用量显然与启动的 V8 线程数相关.例如,增加启动参数--server.threads 的值将启动更多线程并为 V8 使用更多虚拟内存,降低该值将启动更少线程并使用更少虚拟内存.

The virtual memory usage for the V8 threads obviously correlates with the number of V8 threads started. For example, increasing the value of the startup parameter --server.threads will start more threads and use more virtual memory for V8, and lowering the value will start less threads and use less virtual memory.

这篇关于ArangoDB 的内存使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆