ArangoDB的内存使用情况 [英] Memory usage of ArangoDB

查看:151
本文介绍了ArangoDB的内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解Arangodb的局限性以及理想的设置是什么.据我了解,arago会将所有收集数据存储在虚拟内存中,理想情况下,您希望将其放入RAM中.如果集合增加并且无法容纳到RAM中,它将被交换到磁盘上.

I am trying to understand what the limits of Arangodb are and what the ideal setup is. From what I have understood arango stores all the collection data in the virtual memory and ideally you want this to fit in the RAM. If the collection grows and cannot fit in the RAM it will be swapped to disk.

所以我的第一个问题.如果我的数据库增大,我是否需要调整交换分区/文件以适应数据库?

So my first question. If my db grows will I need to adjust the swap partition/file to accommodate the db?

由于arango还将数据同步到磁盘,这是否意味着数据将始终位于RAM和磁盘中?因此,如果我有一个1.5GB的数据库,而我的RAM是1GB,那么我至少需要0.5GB的交换磁盘和1.5GB的常规磁盘空间?

Since arango also syncs the data to disk does this mean that the data will always be located in the RAM and disk? So if I have a db that's 1.5GB and my RAM is 1GB I will need to at least have 0.5GB of swap disk and 1.5GB of regular disk space?

我有点困惑arango如何使用虚拟内存.现在,我有7个收藏夹,这些收藏夹几乎是空的.我有1GB的RAM和1GB的交换磁盘. 管理员报告arango正在使用4.5GB的虚拟内存.如果交换磁盘为1GB,怎么办?当前正在使用80MB的RAM.如果每个馆藏的日志大小为32MB,这不是224MB吗?

I am a bit confused how arango uses the virtual memory. Right now I have 7 collections that are practically empty. I have 1GB of RAM and 1GB of swap disk. The admin reports that arango is using 4.5GB of virtual memory. How is this possible if the swap disk is 1GB? It's currently using 80MB of RAM. Shouldn't this be 224MB if the journal size is 32MB for each collection?

关于期刊大小与馆藏大小的建议是什么?可以随着收藏的增长动态地调整它吗?

What is the recommendation on the journal size vs collection size? Can this be dynamically adjusted as the collection grows?

如果交换磁盘是SSD时经常使用交换磁盘,则预期会产生什么样的性能?如果大量使用交换磁盘,性能是否会类似于使用更传统的数据库(例如mysql)?

What kind of performance is expected if the swap disk is used a lot when the disk is an SSD? If the swap disk is used a lot would the performance be similar to using a more traditional db such as mysql?

推荐答案

ArangoDB将所有数据存储在内存映射文件中. 每个集合可以有0到n个数据文件,每个文件的默认文件大小为32 MB(请注意,此文件大小可以全局调整,也可以在每个集合级别进行调整).空集合(从未有任何数据)将没有数据文件.第一次写入集合将创建数据文件,只要数据文件已满,就会自动创建一个新文件.

ArangoDB stores all data in memory-mapped files. Each collection can have 0 to n datafiles, with a default filesize of 32 MB each (note that this filesize can be adjusted globally or on a per-collection level). An empty collection (that never had any data) will not have a datafile. The first write to a collection will create the datafile, and whenever a datafile is full, a new one will be created automatically.

集合默认情况下以32 MB的块分配数据文件.如果确实有很多但很小的集合,则可能会浪费一些内存.如果您的馆藏数量很少但数量庞大,那么潜在的浪费(数据文件末尾的可用空间)可能并没有太大关系.

Collections allocate datafiles in chunks of 32 MB by default. If you do have many but small collections this might waste some memory. If you many few but big collections, the potential waste (free space at the end of a datafile) probably doesn't matter too much.

无论何时任何ArangoDB操作从内存映射的数据文件读取数据或将数据写入内存映射的数据文件,操作系统都会首先将偏移量转换为文件的页码.这是因为每个数据文件都隐式拆分为特定大小的页面.一个页面的大小取决于平台,但是假设页面大小为4 KB.因此,具有默认文件大小的数据文件将具有8192页.

Whenever any ArangoDB operation reads data from or writes data to a memory-mapped datafile, the operating system will first translate the offset into the file into a page number. This is because each datafile is implicitly split into pages of a specific size. How big a page is is platform-dependent, but let's assume pages are 4 KB in size. So a datafile with a default filesize will have 8192 pages.

操作系统将偏移量转换为文件的页码后,它将确保请求的页面数据存在于物理RAM中.如果该页面尚未在物理RAM中,则操作系统将发出页面错误以触发从磁盘加载请求的页面或交换到物理RAM中.最终,这将使整个页面在RAM中可用,并且此后可能会发生对该页面数据的任何读或写.

After the OS has translated the offset into the file into a page number, it will make sure the data of requested page are present in physical RAM. If the page is not yet in physical RAM, the operating system will issue a page fault to trigger loading of the requested page from disk or swap into physical RAM. This will eventually make the complete page available in RAM, and any reads or writes to the page's data may occur after that.

所有这些操作都是由操作系统的虚拟内存管理器完成的.操作系统可以自由地将其认为良好的映射从数据文件映射到RAM. 例如,当顺序访问内存映射文件时,操作系统可能会很聪明并预读了许多页面,因此在实际访问时它们已经在物理RAM中.

All of this is done by the operating system's virtual memory manager. The operating system is free to map as many pages from a datafile into RAM as it thinks is good. For example, when a memory-mapped file is accessed sequentially, the operating system will likely be clever and read-ahead many pages, so they are already in physical RAM when actually accessed.

操作系统还可以自由交换数据文件的部分或全部页面.如果没有足够的物理RAM来同时将所有数据文件中的所有页面保留在RAM中,则可能会换出页面.它还可能换出一段时间未使用的页面,以使RAM可用于其他操作.可能会为此使用一些LRU算法.

The OS is also free to swap out some or all pages of a datafile. It will likely swap out pages if there is not enough physical RAM available to keep all pages from all datafiles in RAM at the same time. It may also swap out pages that haven't been used for a while, to make RAM available for other operations. It will likely use some LRU algorithm for this.

操作系统的虚拟内存管理器的行为方式完全不同,在各个平台和实现之间都大相径庭.大多数系统还允许配置VM子系统.例如,此处是Linux VM子系统的一些参数.

How the virtual memory manager of an OS behaves exactly is wildly different across platforms and implementations. Most systems also allow configuring the VM subsystem. For example, here are some parameters for Linux's VM subsystem.

因此,很难确定ArangoDB在给定数量的集合及其数据文件中实际将使用多少物理内存.如果根本不访问这些集合,则由于操作系统可能已全部或至少部分地交换了这些集合,使具有内存映射的数据文件可能几乎不使用RAM.如果这些集合被大量使用,则操作系统可能会将其数据文件完全映射到RAM中.但是在两种情况下,内存都算作内存映射.这是因为您可以拥有比物理RAM高得多的虚拟内存使用率.

It is therefore hard to tell how much physical memory ArangoDB will actually use for a given number of collection and their datafiles. If the collections aren't accessed at all, having the datafiles memory-mapped might use almost no RAM as the OS has probably swapped the collections out fully or at least partially. If the collections are heavily in use, the OS will likely have their datafiles fully mapped into RAM. But in both cases the memory counts as memory-mapped. This is you can have a much higher virtual memory usage than you have physical RAM.

如前所述,操作系统在访问不位于RAM中的页面时必须做很多工作,如果可能的话,您希望避免这种情况.如果您经常使用的集合的总大小超过物理RAM的大小,则操作系统在您访问这些集合时别无选择,只能交换大量页面.使用SSD进行交换可能会比使用旋转的HDD更好,但仍然比访问RAM慢得多.长话短说:活动集合的数据(数据文件和索引)应尽可能放入物理RAM中,否则您将看到大量磁盘活动.

As mentioned before, the OS has to do a lot of work when accessing pages that are not in RAM, and you want to avoid this if possible. If the total size of your frequently used collections exceeds the size of the physical RAM, the OS has no alternative but to swap pages out and in a lot when you access these collections. Using an SSD for the swap will likely be better than using a spinning HDD, but is still far slower than RAM access. Long story short: the data of your active collections (datafiles plus indexes) should fit into physical RAM if possible, or you will see a lot of disk activity.

除此之外,ArangoDB不仅为集合数据文件分配了虚拟内存,而且还启动了一些也使用虚拟内存的V8线程(V8是ArangoDB中的JavaScript引擎).此虚拟内存没有文件支持.

Apart from that, ArangoDB does not only allocate virtual memory for the collection datafiles, but it also starts a few V8 threads (V8 is the JavaScript engine in ArangoDB) that also use virtual memory. This virtual memory is not file-backed.

在空的ArangoDB V8中占大多数虚拟内存使用量.例如,在我的64位计算机上,V8线程消耗约5 GB的虚拟内存(但ArangoDB总共仅使用140 MB RAM),而在我的32位计算机上,RAM较少,则V8线程使用约600-700 MB虚拟内存.对于您的情况,在使用4.5 GB VM的情况下,我怀疑V8也是原因.

In an empty ArangoDB V8 accounts for most of the virtual memory usage. For example, on my 64 bit computer, the V8 threads consume about 5 GB of virtual memory (but ArangoDB in total only uses 140 MB RAM), whereas on my 32 bit computer with less RAM, the V8 threads use about 600 - 700 MB virtual memory. In your case, with the 4.5 GB VM usage, I suspect V8 is the reason, too.

V8线程的虚拟内存使用量显然与启动的V8线程数相关.例如,增加启动参数--server.threads的值将启动更多线程并为V8使用更多虚拟内存,而降低该值将启动更少的线程并使用更少的虚拟内存.

The virtual memory usage for the V8 threads obviously correlates with the number of V8 threads started. For example, increasing the value of the startup parameter --server.threads will start more threads and use more virtual memory for V8, and lowering the value will start less threads and use less virtual memory.

这篇关于ArangoDB的内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆