各类数据库的实例(真实案例) [英] Practical example for each type of database (real cases)

查看:21
本文介绍了各类数据库的实例(真实案例)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有几种类型的数据库用于不同的目的,但通常 MySQL 用于所有用途,因为它是最知名的数据库.举个例子,我公司一个大数据的应用,初期有一个MySQL数据库,难以置信,会给公司带来严重的后果.为什么选择 MySQL?只是因为没有人知道应该如何(以及何时)使用另一个 DBMS.

所以,我的问题不是关于供应商,而是关于数据库的类型.你能给我一个具体情况(或应用程序)的实际例子吗?强烈建议在哪些类型的数据库中使用它?

示例:

• 由于 Y,社交网络应该使用类型 X.

• MongoDB 或沙发数据库不支持交易,因此 Document DB 不适合银行或拍卖网站的应用程序.

等等……

<小时>

关系: MySQLPostgreSQLSQLite火鸟MariaDBOracle 数据库SQL 服务器IBM DB2, IBMInformix, Teradata

对象: ZODB, DB4OEloqueraVersantObjectivity DB, VelocityDB

图形数据库: AllegroGraphNeo4j, OrientDB, 无限图graphbasesparkledbflockdbBrightstarDB

键值存储: Amazon DynamoDB, RedisRiak伏地魔FoundationDBleveldbBangDBKAIhamsterdbTarantoolMaxtableHyperDexGenomu, Memcachedb

列族: 大表Hbase超级表Cassandra, Apache Accumulo

RDF 存储: Apache Jena芝麻

多模型数据库: arangodb, Datomic, 东方数据库, FatDBAlchemyDB

文档: Mongo DB, 沙发数据库重新思考数据库Raven DBterrastoreJas DB, Raptor DB, djon DB, EJDB, denso DB, Couchbase

XML 数据库: BaseX塞德娜eXist

分层: InterSystems CachéGT.M 感谢@Laurent Parenteau

解决方案

我发现了两篇关于这个主题的令人印象深刻的文章.所有功劳归于 highscalability.com.此答案中的信息转录自这些文章:

选择下一个 NoSQL 数据库的 35 多个用例

你到底在用 NoSQL 做什么?

<小时>

如果您的应用程序需要...

复杂事务,因为您不能承受丢失数据的损失,或者如果您想要一个简单的事务编程模型,那么请查看关系或网格数据库.

<块引用>

示例: 一个可能需要完整 ACID<的库存系统/a>.当我买一个产品时我很不高兴,他们后来说他们缺货了.我不想要有偿交易.我想要我的物品!

规模化 然后 NoSQL 或 SQL 可以工作.寻找支持横向扩展、分区、实时添加和删除机器、负载平衡、自动分片和重新平衡以及容错的系统.

始终能够写入到数据库,因为您需要高可用性,然后查看 Bigtable 具有最终一致性的克隆.

• 处理大量小型连续读取和写入,这可能是易失性的,然后查看提供快速内存访问的文档或键值或数据库.另外,请考虑 SSD.

• 要实现社交网络操作,那么您首先可能需要一个图形数据库,或者第二个数据库,如 Riak 支持关系.具有简单 SQL 连接的内存中关系数据库可能足以处理小型数据集.Redis' set 和 list 操作也可以工作.

• 要对各种访问模式和数据类型进行操作,然后查看文档数据库,它们通常很灵活且性能良好.

• 强大的大型数据集离线报告然后看看 Hadoop 第一和第二, 支持 MapReduce 的产品.支持 MapReduce 不等于擅长它.

跨越多个数据中心,然后查看 Bigtable 克隆以及其他提供分布式选项的产品,可以处理长延迟并且分区容错.

• 构建 CRUD 应用程序然后查看文档数据库,它们可以轻松访问复杂数据而无需连接.

内置搜索然后查看 Riak.

• 操作数据结构,如列表、集合、队列、发布-订阅,然后查看Redis.对分布式锁定、封顶日志等很有用.

程序员友好性以对程序员友好的数据类型(如 JSON、HTTP、REST、Javascript)的形式出现,然后首先查看文档数据库,然后查看键值数据库.

交易结合物化视图用于实时数据馈送,然后查看VoltDB.非常适合数据汇总和 时间开窗.

企业级支持和 SLA 然后寻找能够满足该市场需求的产品.Membase 就是一个例子.

• 记录可能根本不需要一致性保证的连续流数据,然后查看 Bigtable 克隆,因为它们通常在可以处理大量写入的分布式文件系统上工作.

尽可能简单操作然后寻找托管或PaaS 解决方案,因为他们会为您完成所有工作.

• 出售给企业客户,然后考虑使用关系数据库,因为他们习惯于关系技术.

• 在具有动态属性的对象之间动态建立关系,然后考虑使用图形数据库,因为它们通常不需要模式,并且可以通过编程逐步建立模型.

• 支持大型媒体,然后查看存储服务,例如 S3.NoSQL 系统往往无法处理大型 BLOBS,虽然 MongoDB 有文件服务.

• 快速有效地批量上传大量数据,然后寻找支持该方案的产品.大多数不会,因为它们不支持批量操作.

更简单的升级路径然后使用文档数据库或键值数据库等流动模式系统,因为它支持可选字段、添加字段和删除字段,而无需构建整个架构迁移框架.

• 实现完整性约束,然后选择支持 SQL DDL,在存储过程中实现它们,或者在应用程序代码中实现它们.

非常深的连接深度然后使用图形数据库,因为它们支持实体之间的极快导航.

• 移动行为接近数据,这样数据就不必通过网络移动,然后查看一种或另一种存储过程.这些可以在关系、网格、文档甚至键值数据库中找到.

缓存或存储 BLOB 数据,然后查看键值存储.缓存可以用于一些网页,或者保存复杂的对象,这些对象在加入关系数据库时成本很高,可以减少延迟等等.

经过验证的跟踪记录,例如不破坏数据并且正常工作,然后选择成熟的产品,当您遇到扩展(或其他问题)时,使用常见的解决方法之一(扩展、调整, memcached, 分片, 反规范化等).

流动数据类型,因为您的数据本质上不是表格,或者需要灵活数量的列,或者具有复杂的结构,或者因用户(或其他)而异,然后查看文档、键值和 Bigtable 克隆数据库.每个人的数据类型都有很大的灵活性.

• 其他业务部门运行快速关系查询,这样您就不必重新实现所有内容,然后使用支持 SQL 的数据库.

• 要在云中运行并自动充分利用云功能,那么我们可能还没有.

• 支持二级索引,因此您可以通过不同的键查找数据,然后查看关系数据库和Cassandra 的新 二级索引 支持.

• 创建不断增长的数据集(真的是BigData) 很少被访问,然后查看 Bigtable 克隆,它将通过分布式文件系统传播数据.

与其他服务集成,然后检查数据库是否提供某种后写同步功能,以便您可以捕获数据库更改并将其馈送到其他系统以确保一致性.

容错检查在电源故障、分区和其他故障情况下写入的持久性.

• 将技术封套推向一个似乎没人会走的方向,然后自己构建它,因为有时这就是伟大的需要.

• 要在 移动平台 上工作,然后查看 CouchDB/Mobile couchbase.

<小时>

一般用例 (NoSQL)

巨大.NoSQL 被视为支持新数据堆栈的关键部分:大数据、大量用户、大量计算机、大供应链、大科学等等.当某些东西变得如此庞大以至于必须大规模分布时,NoSQL 就在那里,尽管并非所有 NoSQL 系统都以大为目标.Bigness 可以跨越许多不同的维度,而不仅仅是使用大量的磁盘空间.

大量写入性能.这可能是基于 Google 影响的规范用法.高音量.Facebook 需要存储 每月 1350 亿条消息 (2010 年).例如,Twitter 存在存储 7 TB/数据/数据的问题一天 (2010年),这一要求有望每年翻倍.这是数据太大而无法容纳一个节点的问题.以 80 MB/s 的速度存储 7TB 需要一天的时间,因此写入需要分布在集群上,这意味着键值访问、MapReduce、复制、容错、一致性问题等等.为了更快的写入,可以使用内存系统.

快速键值访问.这可能是 NoSQL 在一般思维模式中被引用次数第二多的优点.当延迟很重要时,很难在键上散列并直接从内存中读取值,或者只需一次磁盘查找.并非每个 NoSQL 产品都与快速访问有关,例如,有些产品更注重可靠性.但人们长期以来一直想要的是更好的 memcached,许多 NoSQL 系统都提供了.

灵活的模式和灵活的数据类型. NoSQL 产品支持一系列新的数据类型,这是 NoSQL 的一个主要创新领域.我们有:面向列、图形、高级数据结构、面向文档和键值.无需大量映射即可轻松存储复杂对象.开发人员喜欢避免使用复杂的架构和 ORM 框架.缺乏结构允许更大的灵活性.我们还有对程序和程序员友好的兼容数据类型,例如 JSON.

架构迁移. 无架构使处理架构迁移变得更容易,而无需过多担心.架构在某种意义上是动态的,因为它们是由应用程序在运行时强加的,因此应用程序的不同部分可以有不同的架构视图.

写作可用性.无论如何,您的写作都需要成功吗?然后我们可以进入分区,CAP, 最终一致性 和所有的爵士乐.

更易于维护、管理和操作.这是非常特定于产品的,但许多 NoSQL 供应商正试图通过让开发人员轻松采用它们来获得采用.他们在易用性、最少的管理和自动化操作上花费了大量精力.这可以降低运营成本,因为不必编写特殊代码来扩展从未打算以这种方式使用的系统.

没有单点故障. 并非每个产品都提供这一点,但我们看到了在相对容易配置和管理高可用性方面的明确融合,以及自动负载平衡和集群大小调整.完美的云合作伙伴.

普遍可用的并行计算.我们看到 MapReduce 已融入产品,这使得并行计算成为未来发展的常态.

程序员易于使用. 访问您的数据应该很容易.虽然关系模型对于最终用户(如会计师)来说是直观的,但对于开发人员来说却不是很直观.程序员了解键、值、JSON、Javascript 存储过程、HTTP 等等.NoSQL 适用于程序员.这是一场由开发商主导的政变.对数据库问题的响应不能总是聘请真正知识渊博的DBA,获取您的架构对,稍微去规范化等等,程序员更喜欢他们可以为自己工作的系统.让产品发挥作用应该不难.钱是问题的一部分.如果扩展一个产品的成本很高,那么你会不会选择更便宜、你可以控制、更容易使用、更容易扩展的产品?

为正确的问题使用正确的数据模型.不同的数据模型用于解决不同的问题.例如,已经付出了很多努力,将图操作嵌入到关系模型中,但它不起作用.在图数据库中解决图问题不是更好吗?我们现在看到了一种试图在问题和解决方案之间找到最佳匹配的一般策略.

避免碰壁. 许多项目在其项目中碰壁.他们已经用尽了所有选项来使他们的系统扩展或正常运行,并且想知道下一步是什么?选择一种产品和一种方法,可以通过使用增量添加的资源进行线性扩展来跳过墙壁,这是令人欣慰的.有一次这是不可能的.一切都需要定制,但这已经改变了.我们现在看到了项目可以轻松采用的可用的开箱即用产品.

分布式系统支持. 并不是每个人都担心非 NoSQL 系统所能达到的规模或性能.他们需要的是一个分布式系统,它可以跨越数据中心,同时处理故障场景而不会出现问题.NoSQL 系统,因为它们专注于规模,倾向于利用分区,倾向于不使用严格的一致性协议,因此非常适合在分布式场景中运行.

可调 CAP 权衡. NoSQL 系统通常是唯一带有滑块"的产品,用于选择它们想要在 CAP 范围内的位置.关系数据库选择强一致性,这意味着它们不能容忍分区故障.最后,这是一个商业决定,应该根据具体情况来决定.你的应用甚至关心一致性吗?几滴可以吗?您的应用需要强一致性还是弱一致性?可用性更重要还是一致性更重要?失败会比犯错更昂贵吗?很高兴拥有可以让您选择的产品.

更具体的用例

• 管理大量非事务性数据流:Apache 日志、应用程序日志、MySQL 日志、点击流、等等

• 同步在线和离线数据.这是 CouchDB 的目标.

• 在所有负载下的快速响应时间.

• 当复杂连接的查询负载对于 RDBMS 而言过大时,避免重连接.

• 低延迟至关重要的软实时系统.游戏就是一个例子.

• 需要支持各种不同的写入、读取、查询和一致性模式的应用程序.有些系统针对 50% 读取、50% 写入、95% 写入或 95% 读取进行了优化.只读应用程序需要极快的速度和弹性、简单的查询,并且可以容忍稍微陈旧的数据.需要中等性能、读/写访问、简单查询、完全权威数据的应用程序.具有复杂查询要求的只读应用程序.

• 负载平衡以适应数据和使用集中并帮助保持微处理器忙碌.

• 实时插入、更新和查询.

• 分层数据,如线程讨论和部件爆炸.

• 动态表创建.

• 两层应用程序,其中低延迟数据通过快速 NoSQL 接口提供,但数据本身可以由高延迟 Hadoop 应用程序或其他低优先级应用程序计算和更新.

顺序数据读取. 需要选择正确的底层数据存储模型.B 树可能不是顺序读取的最佳模型.

• 将可能需要更好性能/可扩展性的部分服务分割到自己的系统上.例如,用户登录可能需要高性能,并且此功能可以使用专用服务来实现这些目标.

缓存. 用于网站和其他应用程序的高性能缓存层.示例是大型强子对撞机使用的数据聚合系统的缓存.投票.

• 实时页面查看计数器.

• 用户注册、个人资料和会话数据.

文档、目录管理和内容管理系统. 将复杂文档存储为一个整体而不是组织为关系表的能力有助于实现这些.类似的逻辑适用于库存、购物车和其他结构化数据类型.

存档. 存储仍可在线访问的大量连续数据流.具有灵活架构的面向文档的数据库,可以处理架构随时间的变化.

分析.使用 MapReduce、Hive 或 Pig 执行支持高写入负载的分析查询和横向扩展系统.

• 使用异构类型的数据,例如,不同媒体类型的通用级别.

• 嵌入式系统.他们不想要 SQL 和服务器的开销,因此他们使用更简单的存储方式.

• 一个市场"游戏,您可以在其中拥有城镇中的建筑物.你想让某人的建筑列表快速弹出,所以你在建筑表的所有者列上进行分区,这样选择是单分区的.但是,当有人购买其他人的建筑物时,您会更新所有者列以及价格.

JPL 正在使用 SimpleDB 存储 流动站 计划属性.S3 中的完整计划 blob 的引用.(来源)

• 联邦执法机构使用信用卡实时追踪美国人,会员卡和旅行预订.

欺诈检测,将交易与已知模式进行实时比较.p>

通过整合每位患者的病史帮助诊断肿瘤类型.

• 用于高更新情况的内存数据库,例如显示每个人的网站最后一次活动"时间(可能用于聊天).如果用户每 30 秒执行一次活动,那么您将几乎达到您的极限,同时有大约 5000 个用户.

• 使用物化视图处理低频多分区查询,同时继续处理高频流数据.

• 优先队列.

• 使用程序友好的界面对缓存数据运行计算,无需通过 ORM.

使用简单的键值列统一大型数据集.p>

• 为了保持快速查询,可以将值汇总到不同的时间片中.

• 计算两个大集合的交集,其中连接会太慢.

时间线 ala Twitter.

Redis 用例、VoltDB 用例等 在这里找到.

There are several types of database for different purposes, however normally MySQL is used to everything, because is the most well know Database. Just to give an example in my company an application of big data has a MySQL database at an initial stage, what is unbelievable and will bring serious consequences to the company. Why MySQL? Just because no one know how (and when) should use another DBMS.

So, my question is not about vendors, but type of databases. Can you give me an practical example of specific situations (or apps) for each type of database where is highly recommended to use it?

Example:

• A social network should use the type X because of Y.

• MongoDB or couch DB can't support transactions, so Document DB is not good to an app for a bank or auctions site.

And so on...


Relational: MySQL, PostgreSQL, SQLite, Firebird, MariaDB, Oracle DB, SQL server, IBM DB2, IBM Informix, Teradata

Object: ZODB, DB4O, Eloquera, Versant , Objectivity DB, VelocityDB

Graph databases: AllegroGraph, Neo4j, OrientDB, InfiniteGraph, graphbase, sparkledb, flockdb, BrightstarDB

Key value-stores: Amazon DynamoDB, Redis, Riak, Voldemort, FoundationDB, leveldb, BangDB, KAI, hamsterdb, Tarantool, Maxtable, HyperDex, Genomu, Memcachedb

Column family: Big table, Hbase, hyper table, Cassandra, Apache Accumulo

RDF Stores: Apache Jena, Sesame

Multimodel Databases: arangodb, Datomic, Orient DB, FatDB, AlchemyDB

Document: Mongo DB, Couch DB, Rethink DB, Raven DB, terrastore, Jas DB, Raptor DB, djon DB, EJDB, denso DB, Couchbase

XML Databases: BaseX, Sedna, eXist

Hierarchical: InterSystems Caché, GT.M thanks to @Laurent Parenteau

解决方案

I found two impressive articles about this subject. All credits to highscalability.com. The information in this answer is transcribed from these articles:

35+ Use Cases For Choosing Your Next NoSQL Database

What The Heck Are You Actually Using NoSQL For?


If Your Application Needs...

complex transactions because you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.

Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!

to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.

• to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.

• to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also, consider SSD.

• to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in-memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.

• to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.

• powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.

• to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.

• to build CRUD apps then look at a Document database, they make it easy to access complex data without joins.

built-in search then look at Riak.

• to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.

programmer friendliness in the form of programmer-friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.

transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.

enterprise-level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.

• to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.

• to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.

• to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.

• to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.

• to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.

• to bulk upload lots of data quickly and efficiently then look for a product that supports that scenario. Most will not because they don't support bulk operations.

• an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.

• to implement integrity constraints then pick a database that supports SQL DDL, implement them in stored procedures, or implement them in application code.

• a very deep join depth then use a Graph Database because they support blisteringly fast navigation between entities.

• to move behavior close to the data so the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

• to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.

• a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use one of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).

fluid data types because your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.

• other business units to run quick relational queries so you don't have to reimplement everything then use a database that supports SQL.

• to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.

• support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra's new secondary index support.

• create an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.

• to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.

fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.

• to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.

• to work on a mobile platform then look at CouchDB/Mobile couchbase.


General Use Cases (NoSQL)

Bigness. NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.

Massive write performance. This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month (in 2010). Twitter, for example, has the problem of storing 7 TB/data per day (in 2010) with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.

Fast key-value access. This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.

Flexible schema and flexible datatypes. NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program- and programmer-friendly compatible datatypes like JSON.

Schema migration. Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.

Write availability. Do your writes need to succeed no matter what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.

Easier maintainability, administration and operations. This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.

No single point of failure. Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.

Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.

Programmer ease of use. Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer-led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?

Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying to find the best fit between a problem and solution.

Avoid hitting the wall. Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.

Distributed systems support. Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.

Tunable CAP tradeoffs. NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end, this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.

More Specific Use Cases

• Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.

• Syncing online and offline data. This is a niche CouchDB has targeted.

• Fast response times under all loads.

• Avoiding heavy joins for when the query load for complex joins become too large for an RDBMS.

• Soft real-time systems where low latency is critical. Games are one example.

• Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. A read-only application which complex query requirements.

• Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.

• Real-time inserts, updates, and queries.

• Hierarchical data like threaded discussions and parts explosion.

• Dynamic table creation.

• Two-tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.

Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.

• Slicing off part of service that may need better performance/scalability onto its own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.

Caching. A high performance caching tier for websites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider. Voting.

• Real-time page view counters.

• User registration, profile, and session data.

Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.

Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.

Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.

• Working with heterogeneous types of data, for example, different media types at a generic level.

• Embedded systems. They don’t want the overhead of SQL and servers, so they use something simpler for storage.

• A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.

JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3. (source)

• Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.

Fraud detection by comparing transactions to known patterns in real-time.

Helping diagnose the typology of tumors by integrating the history of every patient.

• In-memory database for high update situations, like a website that displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users.

• Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.

• Priority queues.

• Running calculations on cached data, using a program friendly interface, without having to go through an ORM.

Uniq a large dataset using simple key-value columns.

• To keep querying fast, values can be rolled-up into different time slices.

• Computing the intersection of two massive sets, where a join would be too slow.

• A timeline ala Twitter.

Redis use cases, VoltDB use cases and more find here.

这篇关于各类数据库的实例(真实案例)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆