需要建议:这是一个“NoSQL”数据库的好用例吗?如果是,哪一个? [英] Need Advice: Is this a good use case for a 'NoSQL' Database? If so, which one?

查看:108
本文介绍了需要建议:这是一个“NoSQL”数据库的好用例吗?如果是,哪一个?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近一直在研究NoSql选项。我的方案如下:



我们收集并存储来自世界各地远程位置的定制硬件的数据。我们每15分钟记录一个网站的数据。我们最终想搬到每1分钟。每个记录有20到200个测量值。一旦设置硬件记录,并每次报告相同的测量。



我们面临的最大的问题是,我们从每个项目获得一组不同的测量。我们测量大约50-100种不同的测量类型,但是任何项目都可以有任何数量的每种类型的测量。没有可以容纳数据的预设集合列。因此,我们在系统上设置和配置项目时,创建并构建每个项目数据表及其所需的确切列。



我们提供工具来帮助分析数据。这通常包括更多的计算和数据聚合,其中一些我们也存储。



我们目前使用一个mysql数据库,每个客户端有一个表。表之间没有关系。



NoSql看起来很有前途,因为我们可以存储一个project_id,timestamp然后其余的不会被预设。这意味着一个表,数据中的更多关系,但仍然处理各种测量。



是一个NoSql解决方案适合这项工作吗?如果是,那么哪些?



我一直在调查MongoDB,看起来很有前途...





项目1有5个数据点记录,mysql表列如下:
timestamp,temp,wind speed, p>

项目2有3个数据点记录mysql表列:
timestamp,temp,irradiance,temp2

解决方案

简单的答案是,这些问题没有简单的答案,找到什么适用于您的场景的唯一方法是投入研发时间。



这个问题很难回答,因为OP的性能要求并不明确。对于写速率为num_customers * 1分钟(这是低的)的许多客户,它似乎是75M /年的记录,但我没有所需的读取/查询性能的数字。



实际上,您已经 数据库使用横向划分,因为您要将每个客户存储在单独的表。这是好的,将提高性能。但是,您尚未确定是否存在性能问题,因此需要对其进行衡量,并在解决问题之前评估问题大小。



一个NoSQL数据库确实是解决传统RDBMS的性能问题的好方法,但它不会提供自动的可伸缩性,而不是一个通用的解决方案。你需要找到你的性能问题修复,然后设计(nosqL)数据模型来提供解决方案。



根据你想实现的目标,我会看在 MongoDB Apache Cassandra Apache HBase Hibari



请记住,NoSQL是一个模糊的术语,通常包含




  • 在读取或写入时性能密集的应用程序。

  • 分布和可扩展性

  • 不同的持久性方法(RAM /磁盘)

  • 更加结构化/定义的访问模式使得特别查询更加困难。



实例我将看到,如果传统的RDBMS可以实现所需的性能,使用所有可用的技术,获取高性能MySQL ,并阅读 MySQL性能博客



Rev1:



根据你的意见,我认为你可以用上面的一个NOSQL引擎实现你想要的是公平的。



我的主要建议是设计和实现您的数据模型,您目前使用的数据模型是不是很正确。



因此,请查看实体属性值模型因为我认为这是完全正确的你所需要的。



您需要先获取数据模型,然后才能考虑使用哪种技术,动态修改模式不是数据模型。



我将使用传统的SQL数据库来验证和测试新的数据模型,因为管理工具更好,在精简数据模型时通常更容易使用模式。


I have recently been researching NoSql options. My scenario is as follows:

We collect and store data from custom hardware at remote locations around the world. We record data from every site every 15 minutes. We would eventually like to move to every 1 minute. Each record has between 20 and 200 measurements. Once set up the hardware records and reports the same measurements every time.

The biggest issue we are facing is that we get a different set of measurements from every project. We measure about 50-100 different measurement types, however any project can have any number of each type of measurement. There is no preset set of columns that can accommodate the data. Because of this we create and build each projects data table with the exact columns it needs as we set up and configure the project on the system.

We provide tools to help analyze the data. This typically includes more calculations and data aggregation, some of which we also store.

We are currently using a mysql database with a table for each client. There are no relations between tables.

NoSql seems promising because we could store a project_id, timestamp then the rest would not be preset. This means one table, more relationships in the data, yet still handling the variety of measurements.

Is a 'NoSql' solution right for this job? If so which ones?

I have been investigation MongoDB and it seems promising...

Example for Clarification:

Project 1 has 5 data points recorded, the mysql table columns look like: timestamp, temp, wind speed, precipitation, irradiance, wind direction

Project 2 has 3 data points recorded mysql table columns: timestamp, temp, irradiance, temp2

解决方案

The simple answer is that there is no simple answer to these sort of problems, the only way to find out what works for your scenario is to invest R&D time into it.

The question is hard to answer because the performance requirements aren't spelled out by the OP. It appears to be 75M/year records over a number of customers with a write rate of num_customers*1minute (which is low), but I don't have figures for the required read / query performance.

Effectively you have already a sharded database using horizontal partitioning because you're storing each customer in a seperate table. This is good and will increase performance. However you haven't yet established that you have a performance problem, so this needs to be measured and the problem size assessed before you can fix it.

A NoSQL database is indeed a good way of fixing performance problems with traditional RDBMS, but it will not provide automatic scalabity and is not a general solution. You need to find your performance problem fix and then design the (nosqL) data model to provide the solution.

Depending on what you're trying to achieve I'd look at MongoDB, Apache Cassandra, Apache HBase or Hibari.

Remember that NoSQL is a vague term typically encompassing

  • Applications that are either performance intensive in read or write. Often sacrificing read or write performance at the expense of the other.
  • Distribution and scalability
  • Different methods of persistency (RAM/Disk)
  • A more structured/defined access pattern making ad-hoc queries harder.

So, in the first instance I'd see if a traditional RDBMS can achieve the required performance, using all available techniques, get a copy of High Performance MySQL and read MySQL Performance Blog.

Rev1:

In light of your comments I think it is fair to say that you could achieve what you want with one of the above NOSQL engines.

My primary recommendation would be to get your data model designed and implemented, what you're using at the moment isn't really right.

So look at Entity-attribute-value model as I think it is exactly right for what you need.

You need to get your data model right before you can consider which technology to use, being honest modifying schemas dynamically isn't a datamodel.

I'd use a traditional SQL database to validate and test the new datamodel as the management tools are better and it's generally easier to work with the schemas as you refine the datamodel.

这篇关于需要建议:这是一个“NoSQL”数据库的好用例吗?如果是,哪一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆