索引具有用户定义模式的无模式dbs？ [英] Indexing schema-less dbs having user-defined schemas?

查看：117 发布时间：2017/3/22 2:53:42 database-design mongodb indexing nosql

本文介绍了索引具有用户定义模式的无模式dbs？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何数据库最重要的功能之一就是查询速度。我们存储数据，并希望快速访问符合我们标准的数据。然而，最近，无模式的数据库已经变得流行起来。如果我们有一个无模式的数据库，但实际上是一个推测的（头上/在应用程序）模式，这是一回事;它只是没有被数据库正式声明。

One of the most essential features of any database is query speed. We store data away and want quick access to data that matches our criteria. However, of late, schema-less databases have become popular. It's one thing if we have a schema-less database but there actually is an inferred (in-the-head/in-the-app) schema; it just hasn't been declared formally by the database.

另一方面，假设我们真的需要一个开放的数据库，其中几个用户拥有自己的模式个别问题领域。用户将定义自己的域。该域（RDBMS服务器上的数据库）将具有其类型（RDBMS中的表），并且这些类型将具有自己的特性（RDBMS中的列）。如何创建复合索引来从给定的域拉取特定的对象/文档/记录（你有什么）？我的查询应该选择一个或多个域（一个IN子句），只有一个主题类型（例如CalendarEvent）针对某些列（start_date> = today，start_date< = today + 1 week，open_for_registration = true，calendar_name ='Public '）。在具有固定模式的数据库中（即使没有声明也是隐含的），这很简单：您可以针对列创建复合索引。

On the other hand, let's say we truly need an open database where several users have their own schemas for their own individual problem areas. A user would define his own "domain". That domain (a database on a RDBMS server) would have its types (tables in an RDBMS) and those types would have their own properities (columns in an RDBMS). How do I create compound indexes to pull specific objects/documents/records (what have you) from a given domain? My query should select one or more domains (an IN clause), just one topic type (e.g. a CalendarEvent), against certain columns (start_date >= today, start_date <= today + 1 week, open_for_registration = true, calendar_name = ‘Public'). In a database with a fixed schema (implied even if not declared), this is simple: you create a compound index against the columns.

复杂性是我们本质上做一个单一的实例，我们说MongoDB的行为就像一个RDBMS服务器，有许多数据库，每个数据库和它的相关模式都是我们的域。

The complexity is that we have essentially made a single instance of let's say MongoDB act like a RDBMS server with many databases where each database and its related schema is our "domain".

问题一周，看各种数据库（MongoDB，Neo4j，MySQL，PostgreSQL）我只找到了几个可能的解决方案：

After busting my brain on this problem for a week and looking at various databases (MongoDB, Neo4j, MySQL, PostgreSQL) I have only found a few possible solutions:

索引所有属性。属性可以在属性表中或MongoDB中的嵌入式文档中表示。在RDBMS中，属性值必须序列化为字符串。 CONS：a）一次只能搜索一个属性（没有复合索引），b）一切都得到一个索引，所以我们不需要开销。

索引选择属性。在PostgreSQL中，可以使用Filtered Index完成。基本上，财产记录将有一点被称为索引，我必须维护。该位将驱动过滤的索引是否使用该特定属性。 CONS：a）我们一次只能搜寻一个物业。这消除了复合指数的使用。我可以想象模拟复合索引的唯一方法是搜索每个单独的索引属性并返回PK的交集。

创建/维护数据库结构以反映工作索引。在MongoDB中，我可以创建一个indexables集合。此集合中的文档可能如下所示：{domain_id：ObjectId（..），type_id：ObjectId（..），fields：{field1：some int value，field2：some date value，field3：some位值}}。然后我在{domain_id：1，type_id：1，fields.field1：1，fields：field2：1，fields：field3，1}上索引indexables集合。然后每次在我的东西集合中创建/更新一个文档时，我将把它的值插入到索引的field1，field2，field3中。（这可以很好地与MongoDB工作，因为我可以将任何数据类型的值插入到这些占位符中。在MySQL中，使用相同的模式，我必须将值序列化为字符串。）我还必须维护domain_id和type_id。基本上，它是由数据库处理的索引之上构建的索引层（我管理自己）。 CONS：额外的开销。而数据库通常会代表我管理索引，我现在必须自己去做这些。由于MongoDB没有交易的概念，我无法保证该文档及其各种索引是在一个步骤中实现的。 PROS：我的复合索引回来了。索引维护在域级别。

我已经考虑允许用户拥有自己的数据库X实例。或者在MongoDB中自己的集合。但是我想知道，如果这不会产生更多的问题，特别是当我们遇到实际的限制（允许数据库或集合的数量）时。我没有太多的想法，抛出了这个想法。

Index all properties. A property could be represented in a Properties table or as an embedded document in MongoDB. In an RDBMS the property values would have to be serialized to strings. CONS: a) Can only search against one property at a time (no compound indexes), b) everything gets an index so we're incurring needless overhead.
Index select properties. In PostgreSQL this could be done with a Filtered Index. Basically, the property record would have a bit called "indexed" that I would have to maintain. This bit would drive whether or not the filtered index uses that particular property. CONS: a) we can still only search against one property at a time. This eliminates "compound indexes" from use. The only way I can imagine to mimic a compound index would be to search against each individual indexed property and return the intersection of the PKs.
Create/maintain database constructs to reflect working indexes. In MongoDB, I could create an "indexables" collection. A document in this collection might look like this: {domain_id: ObjectId(..), type_id: ObjectId(..), fields: {field1: "some int value", field2: "some date value", field3: "some bit value"}}. Then I index the "indexables" collection on {domain_id: 1, type_id: 1, "fields.field1": 1, "fields:field2": 1, "fields:field3", 1}. Then every time I create/update a document in my "things" collection I would have to plug it's values into the field1, field2, field3 slots of indexables. (This works nicely with MongoDB because I can plug values of any datatype into those placeholders. In MySQL, using the same pattern I would have to serialize values to strings.) I would also have to maintain the domain_id and type_id. Basically, it's an index layer (that I manage myself) built on top of indexes handled by the database. CONS: There's additional overhead. Whereas the database would normally manage indexes on my behalf, I now have to take care to do this myself. As MongoDB has no concept of transactions I couldn't guarantee that the document and it's various indexes were committed in a single step. PROS: I have my compound indexes back. Indexes are maintained at the domain level.
I have considered allowing users to have their own instances of database X. Or in MongoDB their own collections. But I wondered if this wouldn't create more issues especially as we run up against practical limitations (number of databases or collections allowed). I tossed this idea out after not too much thought.

其他想法？其他类型的数据库可能更好地处理这个问题？

Other ideas? Other kinds of databases that might better handle this problem?

同样，这个想法是：不同的用户管理自己的域。在域内可以是任何类型的项目。对于每个类型的项目，我们有属性。我想允许用户针对其域运行查询，以获取具有与其条件匹配的属性类型的项目。（因此复合指数）

Again, the idea is this: different users manage their own domains. Within a domain can be items of any "type". For each typed item we have properties. I want to allow users to run queries against their domains to get items of a type having properties that match their conditions. (thus compound indexes)

最后一个想法。一个领域本身并不是无穷无尽的。它可能有10-20类型。在每种类型中，他们可能有多达5000条记录（在大多数情况下），在极端情况下可以说20000。

One last thought. A domain in itself is not intended to be humongous. It might have 10-20 "types". Within each type their might be as many as 5000 records (in most cases) and say 20000 in extreme cases.

不幸的是，这是其中一种情况，尽管 Joel Spolsky的建议我尝试了宇航员架构。

Unfortunately, this is one of those cases where despite Joel Spolsky's advice I attempted astronaut architecture.

索引具有用户定义模式的无模式dbs？ [英] Indexing schema-less dbs having user-defined schemas?

问题描述

推荐答案

相关文章

其他数据库最新文章

热门教程

热门工具

登录关闭

索引具有用户定义模式的无模式dbs？ [英] Indexing schema-less dbs having user-defined schemas?

问题描述

推荐答案

相关文章

其他数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭