大数据还是关系数据库(例如MySQL集群)? [英] Big Data or relational database (like MySQL cluster)?

查看:222
本文介绍了大数据还是关系数据库(例如MySQL集群)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将在项目中处理大量数据.我已经阅读了有关大数据的概念,但从未使用过.但是阅读所有这些大数据文档后,我仍然不确定我的需求是否需要大数据,或者使用传统的关系数据库是否好.

I am going to deal with a huge amount of data in my project. I have read about big data concepts but never used it yet. But reading all those Big Data Documents I am still not sure whether my requirement needs Big Data or is it good to handle with traditional relational database.

以下是有关我的数据库的一些信息.

Here is some information about my DB.

我的主数据库是用于不同数据源的存储库.每个数据源都处理相同类型的数据(相同域中的数据),但是某些数据源包含额外的字段,而其他字段则不可用,而另一些字段则包含较少的字段.换句话说,这些数据源中的某些数据字段相同,但有些不同.因此,我的核心数据库应包含所有这些字段.我的核心数据库中的字段总数应约为2000个字段,并且可能包含10到2000万个记录.

My main DB is a repository for different data sources. Each of this data sources deals with same kind of data (data in same domain), but some data sources contain extra fields which in not available in others and some contain less. In other words some of the data fields in these data sources are same, but some are different. So my core DB should contain all those fields. Total fields in my core DB should be approximately 2000 fields and it may contain 10 to 20 million records.

在我的核心数据库中发生的数据库操作将是数据插入和读取(搜索).由于它处理海量数据,因此我一直在考虑使用大数据概念.但是我仍然不确定这是否适合大数据.因为我的某些数据具有相似的特征(相同的字段),而某些数据包含额外的信息.我需要在数据库中进行所有快速搜索. 谢谢.

The DB operation which is happening in my core DB will be data insertion and reading (searching). Since it deals with huge amount of data I was thinking to use big data concepts. But I am still not sure whether this suits for big data. Because some amount of my data has similar characteristics (same fields) and some contain extra information. And I need all the kind of searching fast in my DB. Thanks.

推荐答案

像MySQL这样的关系数据库可以处理数十亿行/记录,因此决定将取决于您的用例.对于大数据NoSQL系统,了解每个系统的优势和局限性如何映射到您的用例非常重要,因为它们的行为可能有很大不同.

Relational databases like MySQL can handle billions of rows / records so the decision will depend on your use case(s). For Big Data NoSQL systems, it is very important to understand how the strengths and limitations of each system map to your use case(s) as they can behave very differently.

以下是一些MySQL示例:

Here are some MySQL examples:

  • 1.1 billion rows on Percona DB (fork of MySQL)
  • 0.95 billion rows on MySQL

在第二个示例中,他们从MySQL迁移到Redis,因为他们需要存储相当于3590亿行的行,远远超过了他们在MySQL中存储的9.5亿行.

In the second example, they moved from MySQL to Redis because they need to store the equivalent of 359 billion rows, far more than the 950 million they were storing in MySQL.

鉴于您说对搜索有快速的要求,因此了解不同类型的数据库具有不同的搜索支持非常重要,这有助于您了解所需的搜索类型.此外,某些受支持的搜索功能可能有限.如果您的搜索要求超出了核心数据存储功能,那么通常会添加全文解决方案,例如,使用Cassandra进行数据存储,使用Elasticsearch进行搜索.

Given that you say you have fast searching requirements, it is important to understand what kind of searches you need as different databases have different searches they support. Additionally, some supported searches may have limited functionality. If you have search requirements that go beyond the core data store functionality, often times a full text solution will be added, for example, using Cassandra for the data store and Elasticsearch for the search component.

要为该决策提供一些背景知识,考虑您对CAP定理的要求是有用且重要的,该定理指出分布式计算机系统可以提供以下但并非全部以下保证(来自Wikipedia):

To provide some background for this decision, it's useful and important to consider your requirements with respect to the CAP Theorem which states that distributed computer systems can provide some but not all of the following guarantees (from Wikipedia):

  • 一致性(所有节点同时看到相同的数据)
  • 可用性(保证每个请求都收到响应) 关于它是成功还是失败)
  • 分区容限(尽管任意消息丢失或系统部分出现故障,系统仍可继续运行)
  • Consistency (all nodes see the same data at the same time)
  • Availability (a guarantee that every request receives a response about whether it succeeded or failed)
  • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

http://en.wikipedia.org/wiki/CAP_theorem

以图形方式,您可以在这里看到包括MySQL和NoSQL解决方案在内的不同数据库解决方案的映射方式:

Graphically, you can see how different database solutions including MySQL and NoSQL solutions map out here:

如果您提供有关用例的更多信息,则可以获得更详细的答复.

If you provide more information on your use case(s), you can get more detailed responses.

这篇关于大数据还是关系数据库(例如MySQL集群)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆