索引对象数据库的方法 [英] Method for indexing an object database

查看:107
本文介绍了索引对象数据库的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用对象数据库(ZODB)来存储许多对象之间的复杂关系,但遇到了性能问题。结果我开始构建索引以加速对象检索和插入。这是我的故事,我希望你可以提供帮助。

I'm using an object database (ZODB) in order to store complex relationships between many objects but am running into performance issues. As a result I started to construct indexes in order to speed up object retrieval and insertion. Here is my story and I hope that you can help.

最初,当我将数据库添加到数据库时,我会将其插入专用于该对象类型的分支中。为了防止表示同一实体的多个对象,我添加了一个迭代分支中现有对象的方法,以便查找重复项。这首先起作用,但是随着数据库大小的增加,将每个对象加载到内存中并检查属性呈指数级增长并且无法接受。

Initially when I would add an object to the database I would insert it in a branch dedicated to that object type. In order to prevent multiple objects representing the same entity I added a method that would iterate over existing objects in the branch in order to find duplicates. This worked at first but as the database grew in size the time it took to load each object into memory and check attributes grew exponentially and unacceptably.

为了解决这个问题我开始根据对象中的属性创建索引,以便在添加对象时,它将保存在类型分支中以及属性值索引分支中。例如,假设我正在使用属性firstName ='John'和lastName ='Smith'保存一个person对象,该对象将附加到person对象类型分支,并且还将附加到属性索引分支中的列表中,并带有键' John'和'Smith'。

To solve that issue I started to create indexes based on the attributes in the object so that when an object would be added it would be saved in the type branch as well as within an attribute value index branch. For example, say I was saving an person object with attributes firstName = 'John' and lastName = 'Smith', the object would be appended to the person object type branch and would also be appended to lists within the attribute index branch with keys 'John' and 'Smith'.

由于可以分析新对象并且只有在属性索引中相交的对象集,因此重复检查节省了大量时间需要检查。

This saved a lot of time with duplicate checking since the new object could be analysed and only the set of objects which intersect within the attribute indexes would need to be checked.

然而,在更新对象时,我很快遇到了另一个问题。索引需要更新以反映它们可能不再准确的事实。这需要记住旧值,以便可以直接访问它们并删除对象或迭代属性类型的所有值,以便找到然后删除对象。无论哪种方式,性能很快又开始降低,我无法找到解决问题的方法。

However, I quickly ran into another issue with regards to dealing when updating objects. The indexes would need to updated to reflect the fact that they may not be accurate any more. This requires either remembering old values so that they could be directly accessed and the object removed or iterating over all values of an attribute type in order to find then remove the object. Either way performance is quickly beginning to degrade again and I can't figure out a way to solve it.

你以前遇到过这种问题吗?你做了什么解决,或者这只是我在使用OODBMS时需要处理的事情?

Has you had this kind of issue before? What did you do solve it, or is this just something that I have to deal with when using OODBMS's?

提前感谢您的帮助。

推荐答案

是的,repoze.catalog很好,而且记录良好。

Yes, repoze.catalog is nice, and well documented.


  1. 查看使用容器/项层次结构来存储和遍历内容项对象;计划能够通过(a)路径(图形边缘看起来像文件系统)或(b)通过识别某个不同位置的单个容器来遍历内容。

  1. Look at using a container/item hierarchy to store and traverse content item objects; plan to be able to traverse content by either (a) path (graph edges look like a filesystem) or (b) by identifying singleton containers at some distinct location.

使用RFC 4122 UUID(uuid.UUID类型)或64位整数识别您的内容。

Identify your content using either RFC 4122 UUIDs (uuid.UUID type) or 64-bit integers.

使用中央目录编制索引(例如repoze.catalog);目录应位于相对于ZODB的根应用程序对象的已知位置。并且您的目录可能会索引对象的属性并在查询时返回record-id(通常是整数)。您的工作是将这些整数ID映射到(可能是通过UUID),并将其映射到存储内容的数据库中的某些物理遍历路径。如果你使用zope.location和zope.container来从根/应用程序向下遍历对象图,这会有所帮助。

Use a central catalog to index (e.g. repoze.catalog); the catalog should be at a known location relative to the root application object of your ZODB. And your catalog will likely index attributes of objects and return record-ids (usually integers) on query. Your job is to map those integer ids to (perhaps indrecting via UUIDs) to some physical traversal path in the database where you are storing content. It helps if you use zope.location and zope.container for common interfaces for traversal of your object graph from root/application downward.

使用zope.lifecycleevent处理程序索引内容并保持新鲜感。

Use zope.lifecycleevent handlers to index content and keep things fresh.



问题 - 概括



ZODB过于灵活:它只是一个包含事务的持久对象图,但这为您留下了在您自己的数据结构和接口中下沉或游泳的空间。

The problem -- generalized

ZODB is too flexible: it is just a persistent object graph with transactions, but this leaves room for you to sink or swim in your own data-structures and interfaces.

通常,只需从ZODB周围的社区中选择预先存在的习语work:zope.lifecycleevent handlers,containerish遍历使用zope.container和zope.location,类似repoze.catalog。

Usually, just picking pre-existing idioms from the community around the ZODB will work: zope.lifecycleevent handlers, "containerish" traversal using zope.container and zope.location, and something like repoze.catalog.

只有当你用尽广义的习语并知道它们为什么不起作用时,尝试使用ZODB中的各种BTree版本来构建自己的索引。我实际上做的比我承认的要多,但通常有很好的理由。

Only when you exhaust the generalized idioms and know why they won't work, try to build your own indexes using the various flavors of BTrees in ZODB. I actually do this more than I care to admit, but usually have good cause.

在所有情况下,保留索引(搜索,发现)和站点(遍历和存储)结构不同。

In all cases, keep your indexes (search, discovery) and site (traversal and storage) structure distinct.


  • Master ZODB BTrees:你可能想要:

  • Master ZODB BTrees: you likely want:


  • 将内容对象作为Persistent的子类存储在容器中,这些容器是提供容器接口的OOBTree的子类(参见

  • 为您的目录或全局索引存储BTrees,或者使用repoze.catalog和zope.index之类的包来抽象出细节(提示:目录解决方案通常将索引存储为OIBTrees,将为搜索结果生成整数记录ID;然后通常会有某种文档映射器实用程序将这些记录ID转换为应用程序中可解析的内容,如uuid(如果您可以遍历图形到UUID)或路径(方式) Zope2目录确实如此。

恕我直言,不要打扰使用intids和key-references等(如果你不喜欢这些,那就不那么惯用了,也更难不需要它们。只需使用repoze.catalog中的Catalog和DocumentMap来获得整数到uuid或路径形式的结果,然后找出如何获取对象。请注意,您可能需要一些实用程序/单例,它具有从搜索返回的id或uuid检索对象的工作。

IMHO, don't bother working with intids and key-references and such (these are less idiomatic and more difficult if you don't need them). Just use a Catalog and DocumentMap from repoze.catalog to get results in integer to uuid or path form, and then figure out how to get your object. Note, you likely want some utility/singleton that has the job of retrieving your object given an id or uuid returned from a search.

使用zope.lifecycleevent或类似的包提供同步事件回调(处理程序)注册。这些处理程序是您在对象上进行原子编辑时应该调用的(可能每次事务一次,但不在事务机制中)。

Use zope.lifecycleevent or similar package that provides synchronous event callback (handler) registrations. These handlers are what you should call whenever an atomic edit is made on your object (likely once per transaction, but not in transaction machinery).

了解Zope组件架构;不是绝对的要求,但肯定是有帮助的,即使只是为了理解zope.container等上游软件包的zope.interface接口

Learn the Zope Component Architecture; not an absolute requirement, but surely helpful, even if just to understand zope.interface interfaces of upstream packages like zope.container

了解Zope2(ZCatalog)的方法这样做:多个索引或各种排序的目录前端,每个索引搜索一个查询,每个都有专门的数据结构,每个都返回整数记录ID序列。它们通过目录执行集交集合并在索引之间,并作为包含元数据存根的大脑对象的延迟映射返回(每个大脑都有getObject()方法来获取实际的内容对象)。从目录搜索中获取实际对象依赖于使用来自根应用程序对象的路径的Zope2习惯用法来识别所编目项的位置。

Understanding of how Zope2 (ZCatalog) does this: a catalog fronts for multiple indexes or various sorts, which each search for a query, each have specialized data structures, and each return integer record id sequences. These are merged across indexes by the catalog doing set intersections and returned as a lazy-mapping of "brain" objects containing metadata stubs (each brain has a getObject() method to get the actual content object). Getting actual objects from a catalog search relies upon the Zope2 idiom of using paths from the root application object to identify the location of the item cataloged.

这篇关于索引对象数据库的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆