arangodump:我怎么知道最新的“修订版"? [英] arangodump: How do I know the latest "revision"?

查看:31
本文介绍了arangodump:我怎么知道最新的“修订版"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在手动解析和导入来自 arangodump 的数据,其中包含每个文档的每次修订的记录.问题是,我无法分辨哪个项目是最新修订版.

I'm doing manual parsing and importing of data from arangodump, which contains records of every revision of every document. The problem is, I cannot tell which item is the latest revision.

(这在删除文档的情况下也是有问题的,因为在 arangodump 中会有修订但文档为空的记录.)

(This is also problematic in the case of deleted documents where there would be records in the arangodump with a revision but with an empty document.)

来自文档:

客户端可以使用修订 ID 执行简单的相等/不相等比较(例如检查文档是否已更改),但他们不应使用修订 ID 与它们执行大于/小于比较以检查是否文档修订版本早于另一个版本,即使这可能适用于某些情况.

Clients can use revisions ids to perform simple equality/non-equality comparisons (e.g. to check whether a document has changed or not), but they should not use revision ids to perform greater/less than comparisons with them to check if a document revision is older than one another, even if this might work for some cases.

文档没有给我希望.这甚至可能吗?

Docs doesn't give me hope. Is this even possible?

如果没有,手动将 arangodump 导入不同应用程序的正确方法是什么?

If not, what is the proper way to manually import arangodump into a different application?

推荐答案

ArangoDump 旨在尽快为您提供现有数据库的快照.因此,它不会为您提供集合级别的内容,而是磁盘上的内容.正如@CoDEmanX 所指出的,这就是牺牲数据库服务器 ArangoExport 上的资源使用量会给你的.

ArangoDump is intended to give you a snapshot of the existing database as fast as possible. Thus it doesn't give you the contents on the collection level, but as whats on disk. This is, what as @CoDEmanX noted, at the sacrifice of resource usage on the database server ArangoExport will give you.

要回答您获得旧版本文档的原因,我们必须更深入地了解数据库本身.

To answer the reason why you get older versions of documents, we will have to take a deeper look at the database itself.

插入数据库将创建一个带有_key 的新文档.一旦您尝试用 UPDATE 替换它,实际发生的是,写入了一个不可见的文档(又名 Marker),即删除旧版本.之后,创建文档的新版本.

A insert into the database will create a new document, with a _key. Once you try to replace this by i.e. UPDATE, whats actually happening is, that an invisible document (aka Marker) is written, that is to remove the old version. After that, a new Version of the document is created.

这一切都完成了,所以你有一个预写日志 - 又名 WAL.这是以线性方式编写的,但只有部分内容被定义为已同步到磁盘.一旦事务要求对文档进行密封 - 执行将暂停,直到内核回复它可以确保此阶段已同步到存储.

This is all done liniar, so you have a write ahead log - aka WAL. This is written in linear fashion, but only some of its content is defined to have been sync'ed to disk. Once a transaction demands a document to be sealed - the execution is paused untill the kernel replies that it can ensure this stage has been synchronized to the storage.

关于磁盘的方式就到这里了.它以这种方式实现,可为您提供最大吞吐量,同时为您保证某些内容已写入(并且不会卡在磁盘缓存中等)

That much about the way to disk. It is implemented that way to give you a maximum throughput, while giving you warranties that certain things have been written (and are not somewhere stuck in disk caches etc.)

稍后的工作将尝试清理所有内容,并解决未解决的问题.这称为集合".它将从 WAL 收集文档,并将其存储在永久数据库文件中.它还会尝试将删除标记与现有文档结合起来,导致它们最终消失.

A later on job will try to clean up everything, and tie up loose ends. This is called the 'Collection'. It will collect documents from the WAL, and store it in permanent database files. It will also try to combine delete-markers with existing documents resulting in them to finally disappear.

所以一旦集合被运行,被删除的文档连同它们的删除标记实际上会消失.多个数据库文件可以合并为一个数据库文件,如果它们的大小经过某个阈值.甚至可能发生这样的情况,一些删除标记只有在这样的组合之后才能找到它们的文档.

So once the collection has been run, deleted documents combined with their delete markers will actually disappear. Multiple database files may be combined to one database file, if their size undergoes a certain threshhold. It may even happen, that some delete markers find their documents only after such a combination.

这篇关于arangodump:我怎么知道最新的“修订版"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆