如何在数据存储而不是数据库? [英] How to think in data stores instead of databases?

查看:162
本文介绍了如何在数据存储而不是数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,Google App Engine使用数据存储(而不是数据库)存储数据。有人有使用数据存储,而不是数据库的任何提示吗?看来我训练了我的思想,100%在对象关系中直接映射到表结构,现在很难看到不同的东西。我可以理解数据存储的一些好处(例如性能和分发数据的能力),但是牺牲了一些好的数据库功能(例如加入)。

As an example, Google App Engine uses data stores, not a database, to store data. Does anybody have any tips for using data stores instead of databases? It seems I've trained my mind to think 100% in object relationships that map directly to table structures, and now it's hard to see anything differently. I can understand some of the benefits of data stores (e.g. performance and the ability to distribute data), but some good database functionality is sacrificed (e.g. joins).

谁与数据存储像BigTable一起工作有什么好的建议与他们合作?

Does anybody who has worked with data stores like BigTable have any good advice to working with them?

推荐答案

有两个主要的东西,关于App Engine资料储存库与「传统」关系型资料库相较之下:

There's two main things to get used to about the App Engine datastore when compared to 'traditional' relational databases:


  • 资料储存库不区分插入和更新。当您对实体调用put()时,该实体将使用其唯一键存储到数据存储区,并且任何具有该键的内容都会被覆盖。基本上,数据存储区中的每个实体类型都像一个巨大的地图或排序列表。

  • 如您所暗示的,查询的限制要少得多。

要实现的关键事情 - 以及这两种差异背后的原因是,Bigtable基本上是一个巨大的有序字典。因此,put操作只设置给定键的值 - 而不管该键的任何先前值,并且取出操作限于获取单个键或连续的键范围。更复杂的查询可以使用索引,它基本上只是自己的表,允许您实现更复杂的查询作为扫描连续范围。

The key thing to realise - and the reason behind both these differences - is that Bigtable basically acts like an enormous ordered dictionary. Thus, a put operation just sets the value for a given key - regardless of any previous value for that key, and fetch operations are limited to fetching single keys or contiguous ranges of keys. More sophisticated queries are made possible with indexes, which are basically just tables of their own, allowing you to implement more complex queries as scans on contiguous ranges.

吸收了,你有了解数据存储的功能和限制所需的基本知识。可能看起来很任意的限制可能更有意义。

Once you've absorbed that, you have the basic knowledge needed to understand the capabilities and limitations of the datastore. Restrictions that may have seemed arbitrary probably make more sense.

这里的关键是,尽管这些是对关系数据库中可以做的限制,但是这些限制什么使得它实际上按比例放大到Bigtable设计处理的那种程度。你根本不能执行在纸上看起来不错的查询,但在SQL数据库中是非常慢的。

The key thing here is that although these are restrictions over what you can do in a relational database, these same restrictions are what make it practical to scale up to the sort of magnitude that Bigtable is designed to handle. You simply can't execute the sort of query that looks good on paper but is atrociously slow in an SQL database.

在如何改变你如何表示数据,最重要的是预先计算。不要在查询时进行联合,而应尽可能预先计算数据并将其存储在数据存储中。如果要选择随机记录,请生成一个随机数并与每个记录一起存储。 这里有一个关于这些提示和技巧的整本食谱编辑:食谱已不复存在。

In terms of how to change how you represent data, the most important thing is precalculation. Instead of doing joins at query time, precalculate data and store it in the datastore wherever possible. If you want to pick a random record, generate a random number and store it with each record. There's a whole cookbook of these sort of tips and tricks here The cookbook is no longer in existence.

这篇关于如何在数据存储而不是数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆