每种类型数据库的实际示例(实际情况) [英] Practical example for each type of database (real cases)

查看:411
本文介绍了每种类型数据库的实际示例(实际情况)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有几种类型的数据库用于不同的目的,但通常MySQL用于一切,因为是最了解的数据库。只是举一个例子,在我的公司大数据的应用程序有一个初始阶段的MySQL数据库,什么是令人难以置信的,将给公司带来严重的后果。为什么是MySQL?只是因为没有人知道如何(和什么时候)应该使用另一个DBMS。



所以,我的问题不是供应商,而是数据库类型。你能给我一个具体情况(或应用程序)的每种类型的数据库的实用示例,强烈建议使用它吗?



示例:



•社交网络应使用类型X,因为Y。



•MongoDB或沙发数据库不支持事务,因此文档数据库对于银行或拍卖网站的应用程序不利。



等等...






关系: MySQL PostgreSQL SQLite Firebird MariaDB Oracle DB SQL服务器 IBM DB2 IBM Informix Teradata



对象: ZODB DB4O Eloquera Versant Objectivity DB VelocityDB



图表数据库 AllegroGraph Neo4j OrientDB InfiniteGraph graphbase sparkledb flockdb BrightstarDB



键值存储: 亚马逊DynamoDB Redis Riak Voldemort FoundationDB leveldb BangDB KAI hamsterdb Tarantool Maxtable HyperDex Genomu Memcachedb



列族: 大表 Hbase 超级表 Cassandra Apache Accumulo



RDF商店: Apache Jena 芝麻



多模式数据库 arangodb Datomic Orient DB FatDB AlchemyDB



文档 Mongo DB Couch DB Rethink DB Raven DB terrastore Jas DB Raptor数据库 djon DB EJDB denso DB Couchbase



XML数据库: BaseX Sedna , a href =http://exist-db.org/exist/apps/homepage/index.html> eXist



分层: InterSystemsCaché GT.M 感谢@Laurent Parenteau


解决方案



所有学分至 highscalability.com < a> 。该信息从这些网址转录。



http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next- nosql-database.html



http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using -nosql-for.html






如果您的应用程序需要...



复杂交易,因为您不能丢失数据,模型,然后查看关系或网格数据库。



示例:可能需要完全ACID的库存系统。我非常不高兴,当我买了一个产品,他们说,后来他们缺货。我不想要一个补偿交易。我想要我的项目!



缩放然后NoSQL或SQL可以工作。寻找支持横向扩展,分区,实时添加和删除机器,负载平衡,自动分片和重新平衡以及容错的系统。



•到总是能够到数据库,因为您需要高可用性,然后查看具有最终一致性的Bigtable克隆。



以处理大量小型连续读取和写入(可能是易失性的),然后查看Document或Key-value或提供快速内存访问的数据库。还要考虑SSD。



•要实施社交网络操作,您首先可能需要一个Graph数据库, 。具有简单SQL连接的内存关系数据库可能足以满足小数据集。



•可以通过多种访问模式和数据类型进行操作,然后查看文档

•强大的大型数据集离线报告,然后查看Hadoop的第一和第二个产品,这些产品支持MapReduce。

跨多个数据中心,然后查看Bigtable Clones和其他产品



•构建 CRUD 应用程序,然后查看Document数据库,它们使得在不加入的情况下访问复杂数据变得容易。



内置搜索,然后查看Riak。



数据结构(如列表,集合,队列,发布订阅)上查看Redis。用于分布式锁定,封顶日志等等。



程序员友好性以程序员友好的数据类型形式,如JSON,HTTP ,REST,Javascript首先查看文档数据库,然后查看键值数据库。



交易 strong>用于实时数据Feed,然后查看VoltDB。非常适合数据汇总和时间窗口。



企业级支持和SLA 然后寻找一个产品市场。 Membase是一个示例。



•记录可能没有一致性保证的所有数据的连续流,然后查看Bigtable克隆,因为他们通常在可以处理大量写入的分布式文件系统上工作。



尽可能简单或PaaS解决方案,因为他们将为您完成所有工作。



•销售给企业客户,然后考虑关系数据库,因为他们用于关系技术。



•在具有动态属性的对象之间动态构建关系,然后考虑图形数据库因为他们通常不需要架构,可以通过编程逐步构建模型。



•支持大型媒体,然后查看存储服务。 NoSQL系统倾向于不处理大型BLOBS,尽管MongoDB有一个文件服务。



•快速高效地批量上传大量数据,然后查看为产品支持该场景。大多数不会,因为他们不支持批量操作。



更容易升级路径然后使用流体模式系统,如文档数据库或键值数据库,因为它支持可选字段,添加字段和字段删除,而无需构建整个模式迁移框架。



•实现完整性约束,然后选择支持SQL DDL的数据库,在存储过程中实现它们,或在应用程序代码中实现它们。





•移动靠近数据的行为,因此深度数据不必通过网络移动,然后查看一种或另一种的存储过程。这些可以在关系,网格,文档甚至键值数据库中找到。



缓存或存储BLOB 数据,然后查看键值存储。缓存可以针对多个网页,或者保存在关系数据库中加入的昂贵的复杂对象,以减少延迟等。



•a 可靠的跟踪记录就像不破坏数据,只是通常工作然后选择一个已有的产品,当你打开扩展(或其他问题)使用的常见解决方法(扩展,调整,memcached,分片,反规范,

流体数据类型,因为您的数据本质上不是表格式的,或需要灵活的列数,复杂结构,或根据用户(或任何)而变化,然后查看Document,Key-value和Bigtable Clone数据库。



•其他业务部门运行快速关系查询,因此您不必重新实现一切都使用支持SQL的数据库。



•在云中操作并自动充分利用云功能,那么我们可能还没有。



•支持次要索引,因此您可以通过不同的键查找数据,然后查看关系数据库和Cassandra的新辅助索引支持。 p>

•创建一个不断增长的数据(真正的BigData),很少被访问,然后看看Bigtable Clone,它将数据分布在分布式文件系统。



与其他服务集成,然后检查数据库是否提供某种写后备同步功能,



容错检查面对电源故障,分区和其他故障情况。



•在没有人似乎会去的方向推动技术包络,然后自己构建它,因为这是有时需要的东西。



•在移动平台上工作,然后查看CouchDB / Mobile couchbase。






常规用例(NoSQL)



Bigness 。 NoSQL被视为支持新数据栈的关键部分:大数据,大量用户,大量计算机,大供应链,大科学等。当一个事情变得如此巨大,它必须成为大规模分布,NoSQL就在那里,虽然不是所有的NoSQL系统都瞄准大。 Bigness可以跨越许多不同的维度,不只是使用大量的磁盘空间。



大规模的写入性能。这可能是基于Google影响的规范用法。高音量。 Facebook每月需要存储1350亿条消息。例如,Twitter有每天存储7 TB /数据的问题,这种需求的前景是每年多次翻倍。这是数据太大,不适合一个节点的问题。在80 MB / s,需要一天来存储7TB,所以写需要分布在集群上,这意味着键值访问,MapReduce,复制,容错,一致性问题和所有其他。对于更快的写入,可以使用内存系统。



快速键值访问。这可能是NoSQL在一般心灵集。当延迟很重要时,很难在密钥上进行散列,并且直接从内存或在一个磁盘寻道中读取值。不是每个NoSQL产品都是关于快速访问,有些更多的是关于可靠性。但是人们想要的很长一段时间是更好的memcached和许多NoSQL系统提供的。



灵活的模式和灵活的数据类型。产品支持一系列全新的数据类型,这是NoSQL的一个主要创新领域。我们有:面向列,图形,高级数据结构,面向文档和键值。复杂对象可以容易地存储,而不需要大量的映射。开发人员喜欢避免复杂的模式和ORM框架。缺乏结构允许更多的灵活性。我们还有程序和程序员友好的兼容数据类型,如JSON。



架构迁移。架构迁移更容易处理架构迁移,无需担心。模式在某种意义上是动态的,因为它们在运行时由应用程序强加,因此应用程序的不同部分可以有不同的模式视图。



写入可用性。您的写入需要成功吗?



易于维护,管理和操作。这是非常简单的产品但许多NoSQL供应商正在努力通过使开发人员容易采用它们。他们花费大量的精力在易用性,最小的管理和自动化操作。这可以导致较低的操作成本,因为不必编写特殊代码以扩展从未打算以这种方式使用的系统。



没有单点故障。不是每个产品都在交付,但我们看到一个明确的收敛,相对容易配置和管理高可用性与自动负载平衡和集群大小。一个完美的云合作伙伴。



通常可用的并行计算。我们看到MapReduce被嵌入产品,这使得并行计算成为一个



程序员的易用性。访问数据应该很容易。虽然关系模型对于最终用户(如会计师)是直观的,但对开发人员来说并不是很直观。程序员使用grok键,值,JSON,Javascript存储过程,HTTP等。 NoSQL是为程序员。这是一个开发人员领导的政变。对数据库问题的响应并不总是能够雇用一个真正知识渊博的DBA,获得正确的模式,反正规化等等,程序员喜欢他们可以为自己工作的系统。它不应该是那么难以使一个产品执行。金钱是问题的一部分。如果花费大量的成本来扩展产品,那么你不会去使用更便宜的产品,你可以控制,更容易使用,更容易扩展。



为正确的问题使用正确的数据模型。使用不同的数据模型来解决不同的问题。已经投入了大量的努力,例如,将图形操作楔入关系模型,但是它不工作。是不是更好地解决图形数据库中的图问题?



避免撞击墙壁。许多项目命中了某些类型的墙在他们的项目。他们已经用尽所有选项,使他们的系统规模或正常运行,并想知道下一步?选择一种产品和一种可以通过使用增量资源进行线性缩放来跳过墙的方法是令人欣慰的。有一次,这是不可能的。它采取了定制的一切,但这是改变了。



分布式系统支持。并非所有人都可以使用担心的规模或性能超过可以通过非NoSQL系统实现。他们需要的是一个分布式系统,可以跨越数据中心,同时处理故障情况,而不会出现打嗝。 NoSQL系统,因为他们专注于规模,倾向于利用分区,不倾向于使用严格的严格一致性协议,因此在分布式场景中运行非常有利。



可调CAP平衡。 NoSQL系统通常是具有滑块的唯一产品,用于选择他们想要在CAP频谱上着陆的位置。关系数据库选择强一致性,这意味着它们不能容忍分区故障。最后,这是一个商业决定,应该根据具体情况决定。你的应用程序是否关心一致性?有几滴可以吗?你的应用程序需要强或弱的一致性吗?可用性更重要还是一致性?下降会比错误的成本高吗?



更具体的使用案例



•管理大型非事务性数据流:Apache日志,应用程序日志,MySQL日志,点击流等。



•同步在线和离线数据。这是一个利基CouchDB有针对性。



•在所有负载下的快速响应时间。



对于RDBMS来说太大了。



•低延迟至关重要的软实时系统。游戏就是一个例子。



•需要支持各种不同的写,读,查询和一致性模式的应用程序。有50%读取,50%写入,95%写入或95%读取优化的系统。只读应用程序需要极高的速度和弹性,简单的查询,并且可以容忍稍微过时的数据。需要中等性能,读/写访问,简单查询,完全权威数据的应用程序。只读应用程序,这些复杂的查询需求。



•负载平衡以适应数据和使用集中度,并帮助保持微处理器繁忙。



•实时插入,更新和查询。



•线性讨论和零件爆炸等分层数据。



•动态表创建。



•通过快速NoSQL接口提供低延迟数据的两层应用程序,但可以计算数据本身并通过高延迟Hadoop应用或其他低优先级应用更新。



顺序数据读取需要选择正确的基础数据存储模型。 B树可能不是连续读取的最佳模型。



•切断可能需要在自己的系统上具有更好性能/可伸缩性的服务的一部分。例如,用户登录可能需要具有高性能,此功能可以使用专用服务来实现这些目标。



缓存用于网站和其他应用程序的高性能缓存层。示例是大型强子对撞机使用的数据聚集系统的缓存。
投票。



•实时页面查看计数器。



•用户注册,和会话数据。



文档,目录管理和内容管理系统。这些都有助于存储复杂的文档,比组织为关系表。类似的逻辑适用于库存,购物车和其他结构化数据类型。



存档。存储大量连续数据流可在线访问。



分析。使用MapReduce,Hive或Pig来执行支持高写入负载的分析查询和扩展系统。



•使用异构类型的数据,例如,通用级别的不同媒体类型。



•嵌入式系统。他们不想要SQL和服务器的开销,所以他们使用更简单的存储。



•一个市场游戏,你在城里拥有建筑物。您希望某人的构建列表快速弹出,因此您对构建表的所有者列进行分区,以便select被单分区。但是当有人购买别人的建筑物时,您会更新所有者列和价格。



•JPL正在使用SimpleDB存储流动站计划属性。参考文献保存到S3中的完整计划blob。



•联邦执法机构使用信用卡,会员卡和旅行预订实时跟踪美国人。



•通过实时比较交易与已知模式进行欺诈检测。



•通过整合每位患者的病史来帮助诊断肿瘤的类型。
用于高更新情况的内存数据库,例如显示每个人的上次活动时间(用于聊天的网站)。如果用户每30秒执行一次活动,那么对于大约5000个同时使用的用户,您几乎达到了极限。
在继续处理高频流数据时处理使用实例化视图的低频多分区查询。



•优先级队列。



•使用程序友好的界面对缓存的数据运行计算,而不必通过ORM。



•唯一的大型数据集使用简单的键值列。



•为了保持快速查询,可以将值汇总到不同的时间片中。



•计算两个大集合的交集,其中连接太慢。



•时间轴ala Twitter。


There are several types of database for different purposes, however normally MySQL is used to everything, because is the most well know Database. Just to give an example in my company an application of big data has a MySQL database at an initial stage, what is unbelievable and will bring serious consequences to the company. Why MySQL? Just because no one know how (and when) should use another DBMS.

So, my question is not about vendors, but type of databases. Can you give me an practical example of specific situations (or apps) for each type of database where is highly recommended to use it?

Example:

• A social network should use the type X because of Y.

• MongoDB or couch DB can't support transactions, so Document DB is not good to an app for a bank or auctions site.

And so on...


Relational: MySQL, PostgreSQL, SQLite, Firebird, MariaDB, Oracle DB, SQL server, IBM DB2, IBM Informix, Teradata

Object: ZODB, DB4O, Eloquera, Versant , Objectivity DB, VelocityDB

Graph databases: AllegroGraph, Neo4j, OrientDB, InfiniteGraph, graphbase, sparkledb, flockdb, BrightstarDB

Key value-stores: Amazon DynamoDB, Redis, Riak, Voldemort, FoundationDB, leveldb, BangDB, KAI, hamsterdb, Tarantool, Maxtable, HyperDex, Genomu, Memcachedb

Column family: Big table, Hbase, hyper table, Cassandra, Apache Accumulo

RDF Stores: Apache Jena, Sesame

Multimodel Databases: arangodb, Datomic, Orient DB, FatDB, AlchemyDB

Document: Mongo DB, Couch DB, Rethink DB, Raven DB, terrastore, Jas DB, Raptor DB, djon DB, EJDB, denso DB, Couchbase

XML Databases: BaseX, Sedna, eXist

Hierarchical: InterSystems Caché, GT.M thanks to @Laurent Parenteau

解决方案

I found two impressive articles about this subject.

All credits to highscalability.com. The information is transcribed from these urls.

http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html

http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html


If Your Application Needs...

complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.

Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!

to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.

• to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.

• to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also consider SSD.

• to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in- memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.

• to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.

• powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.

• to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.

• to build CRUD apps then look at a Document database, they make it easy to access complex data without joins.

built-in search then look at Riak.

• to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.

programmer friendliness in the form of programmer friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.

transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.

enterprise level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.

• to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.

• to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.

• to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.

• to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.

• to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.

• to bulk upload lots of data quickly and efficiently then look for a product supports that scenario. Most will not because they don't support bulk operations.

• an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.

• to implement integrity constraints then pick a database that support SQL DDL, implement them in stored procedures, or implement them in application code.

• a very deep join depth the use a Graph Database because they support blisteringly fast navigation between entities.

• to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

• to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.

• a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use on of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).

fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.

• other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.

• to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.

• support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.

• creates an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.

• to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.

fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.

• to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.

• to work on a mobile platform then look at CouchDB/Mobile couchbase.


General Use Cases (NoSQL)

Bigness. NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.

Massive write performance. This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month. Twitter, for example, has the problem of storing 7 TB/data per day with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.

Fast key-value access. This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.

Flexible schema and flexible datatypes. NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program and programmer friendly compatible datatypes likes JSON.

Schema migration. Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic, because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.

Write availability. Do your writes need to succeed no mater what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.

Easier maintainability, administration and operations. This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.

No single point of failure. Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.

Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.

Programmer ease of use. Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?

Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying find the best fit between a problem and solution.

Avoid hitting the wall. Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.

Distributed systems support. Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.

Tunable CAP tradeoffs. NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.

More Specific Use Cases

• Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.

• Syncing online and offline data. This is a niche CouchDB has targeted.

• Fast response times under all loads.

• Avoiding heavy joins for when the query load for complex joins become too large for a RDBMS.

• Soft real-time systems where low latency is critical. Games are one example.

• Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. Read-only application which complex query requirements.

• Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.

• Real-time inserts, updates, and queries.

• Hierarchical data like threaded discussions and parts explosion.

• Dynamic table creation.

• Two tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.

Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.

• Slicing off part of service that may need better performance/scalability onto it's own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.

Caching. A high performance caching tier for web sites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider. Voting.

• Real-time page view counters.

• User registration, profile, and session data.

Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.

Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.

Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.

• Working with heterogenous types of data, for example, different media types at a generic level.

• Embedded systems. They don’t want the overhead of SQL and servers, so they uses something simpler for storage.

• A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.

• JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3.

• Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.

• Fraud detection by comparing transactions to known patterns in real-time.

• Helping diagnose the typology of tumors by integrating the history of every patient. In-memory database for high update situations, like a web site that displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users. Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.

• Priority queues.

• Running calculations on cached data, using a program friendly interface, without have to go through an ORM.

• Unique a large dataset using simple key-value columns.

• To keep querying fast, values can be rolled-up into different time slices.

• Computing the intersection of two massive sets, where a join would be too slow.

• A timeline ala Twitter.

这篇关于每种类型数据库的实际示例(实际情况)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆