SQL到键值 [英] SQL to Key Value
问题描述
我想从 SQL 方法过渡到 Key Value 方法,因为我处理大数据"并希望从DynamoDB这样的系统中受益,Riak或Cassandra.
I'd like to move from the SQL approach to the Key Value approach, because I deal with "big data" and would like to benefit from systems like DynamoDB, Riak or Cassandra.
当数据不相关时,这很容易,因此可以采用一种基于文档的方法(主键+数据,但是没有关系).
It's quite easy when the data is unrelated, thus one have a document based approach (a primary key + data, but no relations).
对于在数据建模方面的任何理论或学术投入,我将不胜感激.
I'd appreciate any theoretical or academic input on how to model my data.
推荐答案
在过去的四年中,我一直在使用NoSQL,这正是我的想法,所学的……我的个人黄金法则.
I've been using NoSQL in the last 4 years and this is just what I think, what I learnt ... my personal golden rules.
前提:在SQL世界中,数据之间的任何可能的关系,要处理的任何问题或情况通常会给出准确的答案,这些答案是根据产品的年龄和"唯一性"给出的-来访的人在这个完美的世界"中,尝试以相同的方式查看no-sql,但是在这里,根据应用程序的需求和所使用的产品的不同,任何问题都可能有很多解决方案(或没有解决方案). /p>
Premise: in the SQL world any possible relation between data, any problem or situation to deal with often come with a precise answer given both from age and "uniqueness" of the product -- people coming from this "perfect world" try to look at the no-sql in the same way, but here any problem can have many solutions (or no solution) based both on the needs of the application and on the product you're using.
-
在编写模型之前,请先考虑一下查询.术语"面向查询"确实适合上下文-深入分析,您对如何查询数据的了解越多,最好的结果就是
Think at queries before writing the model. The term "query-oriented" really fit for the context - go deep with analysis, the more you know about how you'll query your data the best will be the result
反规范化.不要考虑一个表拥有某些数据",而更像是一个表可以回答少量查询". -因此您的数据(或数据的不同子集)可能会在不同的表中重复.这是避免连接和联系的规范和方法
Denormalize. Don't think about "a table owns certain data" but more like "a table answers to few queries". -- so your data (or different subset of your data) might be repeated in different tables. This is the norm and a way to avoid joins and relations
这是对第2部分的隐式扩展:不要认为表越少,最好的设计"就越多-查询越多,表也可能越多
It's implicitly an extension of first 2: don't think "the less tables will make the best design" -- the more are the queries and probably the more will be the tables
研究您的产品-每个系统提供不同的功能-其中一些将免费为您提供数据排序",另一些则可能提供集合,回调,触发器等-因此该模型可以从一种产品到另一种产品完全不同
Study your product -- Each system offers different features -- some of these will offer you "data sorting" for free, some some others may offers collections, callbacks, triggers and so on -- so the model could be quite different from one product to another
满足您的需求和可能性-有时,您将不得不选择(例如,使用不同的排序数据来创建新表或对数据客户端进行排序).没有正确的答案.如果磁盘空间很少或要排序的数据很小,则可以选择一种方法,如果计算能力"很少,则最好选择另一种方法
Deal with your needs and possibilities -- sometimes you will have to choose, for instance, if creating a new table with data differently sorted or sorting your data client side. There is not a correct answer. If you have few disk space or data to be sorted are small sets you might choose a way, if you have few "computing power" you'd better choose the other
请记住,NoSQL并不是"No SQL",而是"不仅仅是SQL ".您也可以将您的架构想象成一个混合体(我认为 https://mariadb.org/提供了这种解决方案)或记住,您可以放置一层Hive/Shark/Pig来执行更复杂的后端查询"
Remember that NoSQL doesn't mean "No SQL" but "Not Only SQL". You can also imagine your schema as an hybrid (I think that https://mariadb.org/ offers this kind of solution) or remember that you can put a layer of Hive/Shark/Pig to perform more complex "backend queries"
如果您选择了Cassandra,则在研究了一些产品之后,请在此处查看:
If you choose Cassandra, after having studied a little the product, give a look here:
HTH, 卡洛
这篇关于SQL到键值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!