索引具有用户定义模式的无模式dbs? [英] Indexing schema-less dbs having user-defined schemas?

查看:117
本文介绍了索引具有用户定义模式的无模式dbs?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何数据库最重要的功能之一就是查询速度。我们存储数据,并希望快速访问符合我们标准的数据。然而,最近,无模式的数据库已经变得流行起来。如果我们有一个无模式的数据库,但实际上是一个推测的(头上/在应用程序)模式,这是一回事;它只是没有被数据库正式声明。

One of the most essential features of any database is query speed. We store data away and want quick access to data that matches our criteria. However, of late, schema-less databases have become popular. It's one thing if we have a schema-less database but there actually is an inferred (in-the-head/in-the-app) schema; it just hasn't been declared formally by the database.

另一方面,假设我们真的需要一个开放的数据库,其中几个用户拥有自己的模式个别问题领域。用户将定义自己的域。该域(RDBMS服务器上的数据库)将具有其类型(RDBMS中的表),并且这些类型将具有自己的特性(RDBMS中的列)。如何创建复合索引来从给定的域拉取特定的对象/文档/记录(你有什么)?我的查询应该选择一个或多个域(一个IN子句),只有一个主题类型(例如CalendarEvent)针对某些列(start_date> = today,start_date< = today + 1 week,open_for_registration = true,calendar_name ='Public ')。在具有固定模式的数据库中(即使没有声明也是隐含的),这很简单:您可以针对列创建复合索引。

On the other hand, let's say we truly need an open database where several users have their own schemas for their own individual problem areas. A user would define his own "domain". That domain (a database on a RDBMS server) would have its types (tables in an RDBMS) and those types would have their own properities (columns in an RDBMS). How do I create compound indexes to pull specific objects/documents/records (what have you) from a given domain? My query should select one or more domains (an IN clause), just one topic type (e.g. a CalendarEvent), against certain columns (start_date >= today, start_date <= today + 1 week, open_for_registration = true, calendar_name = ‘Public'). In a database with a fixed schema (implied even if not declared), this is simple: you create a compound index against the columns.

复杂性是我们本质上做一个单一的实例,我们说MongoDB的行为就像一个RDBMS服务器,有许多数据库,每个数据库和它的相关模式都是我们的域。

The complexity is that we have essentially made a single instance of let's say MongoDB act like a RDBMS server with many databases where each database and its related schema is our "domain".

问题一周,看各种数据库(MongoDB,Neo4j,MySQL,PostgreSQL)我只找到了几个可能的解决方案:

After busting my brain on this problem for a week and looking at various databases (MongoDB, Neo4j, MySQL, PostgreSQL) I have only found a few possible solutions:


  • 索引所有属性。属性可以在属性表中或MongoDB中的嵌入式文档中表示。在RDBMS中,属性值必须序列化为字符串。 CONS:a)一次只能搜索一个属性(没有复合索引),b)一切都得到一个索引,所以我们不需要开销。

  • 索引选择属性。在PostgreSQL中,可以使用Filtered Index完成。基本上,财产记录将有一点被称为索引,我必须维护。该位将驱动过滤的索引是否使用该特定属性。 CONS:a)我们一次只能搜寻一个物业。这消除了复合指数的使用。我可以想象模拟复合索引的唯一方法是搜索每个单独的索引属性并返回PK的交集。

  • 创建/维护数据库结构以反映工作索引。在MongoDB中,我可以创建一个indexables集合。此集合中的文档可能如下所示:{domain_id:ObjectId(..),type_id:ObjectId(..),fields:{field1:some int value,field2:some date value,field3:some位值}}。然后我在{domain_id:1,type_id:1,fields.field1:1,fields:field2:1,fields:field3,1}上索引indexables集合。然后每次在我的东西集合中创建/更新一个文档时,我将把它的值插入到索引的field1,field2,field3中。 (这可以很好地与MongoDB工作,因为我可以将任何数据类型的值插入到这些占位符中。在MySQL中,使用相同的模式,我必须将值序列化为字符串。)我还必须维护domain_id和type_id。基本上,它是由数据库处理的索引之上构建的索引层(我管理自己)。 CONS:额外的开销。而数据库通常会代表我管理索引,我现在必须自己去做这些。由于MongoDB没有交易的概念,我无法保证该文档及其各种索引是在一个步骤中实现的。 PROS:我的复合索引回来了。索引维护在域级别。

  • 我已经考虑允许用户拥有自己的数据库X实例。或者在MongoDB中自己的集合。但是我想知道,如果这不会产生更多的问题,特别是当我们遇到实际的限制(允许数据库或集合的数量)时。我没有太多的想法,抛出了这个想法。

  • Index all properties. A property could be represented in a Properties table or as an embedded document in MongoDB. In an RDBMS the property values would have to be serialized to strings. CONS: a) Can only search against one property at a time (no compound indexes), b) everything gets an index so we're incurring needless overhead.
  • Index select properties. In PostgreSQL this could be done with a Filtered Index. Basically, the property record would have a bit called "indexed" that I would have to maintain. This bit would drive whether or not the filtered index uses that particular property. CONS: a) we can still only search against one property at a time. This eliminates "compound indexes" from use. The only way I can imagine to mimic a compound index would be to search against each individual indexed property and return the intersection of the PKs.
  • Create/maintain database constructs to reflect working indexes. In MongoDB, I could create an "indexables" collection. A document in this collection might look like this: {domain_id: ObjectId(..), type_id: ObjectId(..), fields: {field1: "some int value", field2: "some date value", field3: "some bit value"}}. Then I index the "indexables" collection on {domain_id: 1, type_id: 1, "fields.field1": 1, "fields:field2": 1, "fields:field3", 1}. Then every time I create/update a document in my "things" collection I would have to plug it's values into the field1, field2, field3 slots of indexables. (This works nicely with MongoDB because I can plug values of any datatype into those placeholders. In MySQL, using the same pattern I would have to serialize values to strings.) I would also have to maintain the domain_id and type_id. Basically, it's an index layer (that I manage myself) built on top of indexes handled by the database. CONS: There's additional overhead. Whereas the database would normally manage indexes on my behalf, I now have to take care to do this myself. As MongoDB has no concept of transactions I couldn't guarantee that the document and it's various indexes were committed in a single step. PROS: I have my compound indexes back. Indexes are maintained at the domain level.
  • I have considered allowing users to have their own instances of database X. Or in MongoDB their own collections. But I wondered if this wouldn't create more issues especially as we run up against practical limitations (number of databases or collections allowed). I tossed this idea out after not too much thought.

其他想法?其他类型的数据库可能更好地处理这个问题?

Other ideas? Other kinds of databases that might better handle this problem?

同样,这个想法是:不同的用户管理自己的域。在域内可以是任何类型的项目。对于每个类型的项目,我们有属性。我想允许用户针对其域运行查询,以获取具有与其条件匹配的属性类型的项目。 (因此复合指数)

Again, the idea is this: different users manage their own domains. Within a domain can be items of any "type". For each typed item we have properties. I want to allow users to run queries against their domains to get items of a type having properties that match their conditions. (thus compound indexes)

最后一个想法。一个领域本身并不是无穷无尽的。它可能有10-20类型。在每种类型中,他们可能有多达5000条记录(在大多数情况下),在极端情况下可以说20000。

One last thought. A domain in itself is not intended to be humongous. It might have 10-20 "types". Within each type their might be as many as 5000 records (in most cases) and say 20000 in extreme cases.

不幸的是,这是其中一种情况,尽管 Joel Spolsky的建议我尝试了宇航员架构。

Unfortunately, this is one of those cases where despite Joel Spolsky's advice I attempted astronaut architecture.

推荐答案


可以更好地处理这个问题的其他类型的数据库?

Other kinds of databases that might better handle this problem?



你是否考虑过Excel?也许只是索引的平面文件:)

Have you considered Excel? Maybe just indexed flat files :)

看,你要在这里遇到的基本问题是没有银弹。你的想法很好,但在某些时候你必须接受一些权衡。

Look, the basic problem you're going to have here is that there is not silver bullet. Your idea is fine, but at some point you have to accept some set of trade-offs.

你不能索引一切。在某些时候,您必须识别常用查询,并为这些事物构建一些索引。除非你打算把内容保留下来,否则你最终会在某个时候创建​​索引。

You can't index everything. At some point you'll have to identify "commonly-used" queries and build some indexes for those things. Unless you're planning to keep everything in memory, you'll end up creating indexes at some point.


在每个类型中,他们可能是多达5000条记录(在大多数情况下),在极端情况下说20000。

Within each type their might be as many as 5000 records (in most cases) and say 20000 in extreme cases.

嘿,有一个真正的限制。你可以用5k的记录多少硬件? 20万条记录怎么样?是否足以将其全部保留在RAM中?把它的一部分留在RAM中?保持RAM中的索引?

Hey there's a true limitation. How much hardware can you throw at 5k records? How about 200k records? Is it going to be enough to keep it all in RAM? Keep part of it in RAM? Keep just the indexes in RAM?

如果你想让用户在自己的动态模式中填充,我个人觉得 MongoDB 是一种自然的契合。特别是对于你所说的这些小数据集。

If you want to let users just stuff in their own "dynamic" schemas, I personally feel that MongoDB is a natural fit. Especially for these small data sets you're indicating.

但是它不是一个银弹。这些解决方案中的每一个都会有自己的问题。如果有一个实际的数据库可以处理你提出的所有要求,那么让我们面对它,我们都会使用这个DB:)

But it's not a silver bullet by any means. Each of these solutions will have their own set of problems. If there was an actual DB that could handle all of the requirements you put forth, let's face it, we'd all be using that DB :)

这篇关于索引具有用户定义模式的无模式dbs?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆