哪些NoSQL解决方案适用于简单的,基于时间的数据点值? [英] which NoSQL solution for simple, time-based datapoint values?

查看:173
本文介绍了哪些NoSQL解决方案适用于简单的,基于时间的数据点值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读了很多关于不同的NoSQL数据库,但我还没有决定哪个解决方案是最适合我的情况 - 可能是因为我没有NoSQL数据库的经验,除了REDIS,我用于缓存。 / p>

在我的场景中,我必须存储约200,000个数据点的基于时间的历史值(基于数据点的混合浮点数,整数,布尔值)。 通常每个数据点每15分钟或更少获得一个新值,总计每年十亿个值。一些数据点甚至可以每隔几秒存储一个值,对于单个数据点,每年的理论最大值为31,536,000个值(我们不需要亚秒精度),但这将是罕见的。



数据点的数量在未来很容易翻倍,这意味着它必须是可扩展的系统。





为了可视化数据,必须能够在任何存储的时间范围内快速分析单个数据点的数据 - 几个小时或整年。在这个范围内,我需要一个特定的分辨率的最小/最大/平均值,通常是时间范围除以大约1000.



在mySQL中这样做: / p>

  SELECT TRUNCATE(Timestamp / 3600)AS slot,MIN(value),AVG(value),MAX(value)
FROM data
WHERE Timestamp> = [from]
AND Timestamp< [to]
AND DatapointID = [some ID]
GROUP BY 1

...它将以1小时的精度返回图表的数据。



如何使用NoSQL数据库来做到这一点,这样信息只在几个毫秒?



这是我最关心的问题。




$ b b

数据近乎实时地添加,因此不应该有很多写操作。必须尽可能更新现有数据,但极为罕见。



其他必要条件




  • 必须能够快速地将所有(或部分)数据复制到外部开发机器,而这些机器不应该需要数小时。

  • 必须可以在不停止数据库的情况下备份所有数据(每日备份)。

  • 我们必须导入大约5亿个现有数据,这需要在合理的时间内完成。

  • 数据库必须可以通过PHP访问。



    • 哪个NoSQL数据库最适合这些需求,以及如何使用它进行上述数据查询?

      解决方案

      第一个问题进入我的脑海,为什么它必须是一个NoSQL数据库?



      您首先建议您对MySQl进行以下查询

        SELECT TRUNCATE(Timestamp / 3600)AS slot,MIN(value),AVG(value),MAX(value)
      FROM data
      WHERE Timestamp> = [from]
      AND Timestamp< [to]
      AND DatapointID = [some ID]
      GROUP BY 1

      我会建议以下查询:

        SELECT TRUNCATE(Timestamp / 3600)AS slot,MIN(value),AVG ,MAX(value)
      FROM data
      WHERE DatapointID = [some ID]
      AND Timestamp> = [from]
      AND Timestamp< [to]
      GROUP BY 1

      这是因为DatapointID应该是索引。所以MySQL可以限制id上的记录,然后查看Timestamp,而不是搜索所有的记录做一些数学,然后匹配索引。



      MySQL回到你对NoSQL的问题。



      去年我检查了CouchDB,MongoDB和MySQL。虽然这是一个完全不同的用例,我可以分享一些结果。



      MySQL




      • 阅读:8.500条记录/ s

      • 写:400条记录



      MongoDB




      • 阅读:17.500条记录

      • 写:13.000 records / s



      CouchDB




      • 读取&写入:300条记录/ s



      所以CouchDB在我的测试中很慢。我只是插入简单的记录,没有代码了。



      MongoDB中一个不错的功能是id也包含创建时的时间戳。虽然当你想要使用MongoDB当你已经有数据时,这是棘手的。



      也由MySQL查询解释,你需要索引,以保持你的速度高。好的MongoDB支持他们。


      1. MongoDB有一个等效的mysqldump,不知道它有多快,所以你必须测试

      2. 定义合理,但MongoDB的写入速度比MySQL快32.5倍。
      3. b $ b
      4. 它有一个PHP驱动程序,因此它有一个驱动程序。

      获得信息帮助,只是尝试一些数据库。



      在第一个注释后编辑





      使用以下文档





      所以使用MongoDB查询有两个部分,where子句和聚合。



      子句 Timestamp> = [from] AND Timestamp< [to] AND DatapointID = [some ID]



      MongoDB实际上在哪里容易学习a href =http://docs.mongodb.org/manual/reference/operator/query/where/ =nofollow>其中docs & 查找文档了解详情)。



      但是

        {
      DatapointID:ID,
      时间戳记:{$ gte:from },
      时间戳:{$ lte:till},
      }

      code> ID ,来自& 直到是要填写的值!



      然后硬部分进行聚合。但是地狱有文档使事情更容易一点。因此,请参阅汇总文档& 匹配文档,以查看我从中获取信息的位置。



      但是组功能看起来像下面的



        {
      _id:1,
      minValues:{$ min:value},
      avgValue:{$ avg:value},
      maxValue:{$ max:value},
      }




      • 注意:不确定 _id:1 有效,但是您需要有效的列



      c>是记录中的列名。



      现在我们需要完成查询。所以你可以执行它。它将查看结束,然后像下面:

        db.articles.aggregate(
      {$ match:{
      DatapointID:ID,
      时间戳:{$ gte:from},
      时间戳:{$ lte:till},
      }},
      {$ group:{
      _id:1,
      minValues:{$ min:value},
      avgValue:{$ avg:value},
      maxValue:{$ max:value},
      }}
      );

      代码未经测试


      I read a lot about different NoSQL databases but I'm still undecided which solution would be the most appropriate in my situation - probably because I have no experience with NoSQL databases - except REDIS, which I'm using for caching.

      In my scenario I have to store time-based historic values (mixed float, integer, boolean values based on datapoint) for about 200,000 datapoints. Typically each datapoint will get a new value every 15 minutes, or less, totaling something in the range of a billion values per year. Some datapoints may even store a value every few seconds, up to a theoretical maximum of 31,536,000 values per year for a single datapoint (we don't need sub-second precision), but that will be rare.

      The number of datapoints could double easily in future, meaning it has to be a scalable system.

      Reading

      To visualize the data, it must be possible to quickly analyze the data of a single datapoint in any stored time range - be that a few hours or a whole year. In that range I need min/max/avg values in a specific resolution, typically the time range divided by about 1000.

      In mySQL that would be done like this:

        SELECT TRUNCATE(Timestamp / 3600) AS slot, MIN(value), AVG(value), MAX(value)
          FROM data
          WHERE Timestamp >= [from]
            AND Timestamp < [to]
            AND DatapointID = [some ID]
          GROUP BY 1
      

      ...which would return data for a chart with the precision of 1 hour.

      How can this be done with NoSQL databases so that the information is available in just a few milliseconds?

      This is my primary concern.

      Writing

      Data is added near real-time, so that shouldn't be a lot of write operations. Updates of existing data must be possible, but will be extremely rare. Data will come in time-sorted, but I can't really guarantee that.

      Other requisites

      • It must be possible to quickly copy all (or alternatively portions) of the data to external development machines, which should not require hours. This must be possible without affecting the production system.
      • It must be possible to backup all the data without stopping the database (daily backup).
      • We have to import about half a billion of existing data, which needs to complete in a reasonable time.
      • The database must be accessible with PHP.

      Which NoSQL database is best suited for these needs and how can the above data query be done with it?

      解决方案

      Well first question that came into my mind, why does it have to be a NoSQL database? You explained how you would do it in MySQL but did not read why you don't want to use it.

      First of you suggested the following query for MySQl

      SELECT TRUNCATE(Timestamp / 3600) AS slot, MIN(value), AVG(value), MAX(value)
          FROM data
          WHERE Timestamp >= [from]
            AND Timestamp < [to]
            AND DatapointID = [some ID]
          GROUP BY 1
      

      I would recommend the following query;

      SELECT TRUNCATE(Timestamp / 3600) AS slot, MIN(value), AVG(value), MAX(value)
      FROM data
      WHERE DatapointID = [some ID]
        AND Timestamp >= [from]
        AND Timestamp < [to]
      GROUP BY 1
      

      This is because DatapointID should be an index. So MySQL can limit the records on the id, and then look at the Timestamp and not search all the records do some math and then match on the index.

      That said about MySQL back to your question about NoSQL.

      Last year I checked CouchDB, MongoDB vs MySQL for my study. Although it is a total different use case I had I could share some of the results.

      MySQL

      • read: 8.500 records/s
      • write: 400 records /s

      MongoDB

      • read: 17.500 records/s
      • write: 13.000 records/s

      CouchDB

      • read&write: 300 records/s

      So CouchDB was slow in my test. I just insert simple records, don't have the code any more.

      A nice feature in MongoDB is that the id's also contain the timestamp of when they were created. Although is is tricky for when you want to use MongoDB when you already have data.

      Also explained by the MySQL query, you need indexes to keep you speed high. Well MongoDB supports them.

      1. MongoDB has an equivalent of mysqldump, don't know how fast it is so you have to test that with your own data.
      2. Not sure about this one
      3. Define reasonable, but MongoDB is 32,5 times faster with writing then MySQL
      4. Well it has a PHP driver so it has a driver.

      Hoped the info helped, it also helps to just try some of the Databases. Often a day just playing with the databases gives you more info then a week reading about them

      edit after first comment

      The SQL query would roughly look like the following in MongoDB.

      Used the following docs

      So with MongoDB the query has two parts, the where clause and the aggregation.

      In sql the where clause Timestamp >= [from] AND Timestamp < [to] AND DatapointID = [some ID]

      The where in MongoDB is actualy easy to learn (I found it easy)(see where docs & find docs for more info).

      But the where part would look like

      {
          DatapointID: ID,
          Timestamp: {$gte: from},
          Timestamp: {$lte: till},
      }
      

      ID, from & till are values to be filled in!

      Then the hard part make the aggregation. But hell there are docs for to make things a little easier. So see aggregation docs & match docs to see where I got my info from.

      But the group function will look like something as the following

      {
          _id: 1,
         minValues: {$min: value},
         avgValue: {$avg: value},
         maxValue: {$max: value},
      }
      

      • note: not sure if _id: 1 works but otherwise you need a valid column

      value is a column name in the record.

      Now we need to finish the query. So you can execute it. It will look at the end then like the following:

      db.articles.aggregate(
          { $match : {
              DatapointID: ID,
              Timestamp: {$gte: from},
              Timestamp: {$lte: till},
          } },
          { $group: {
              _id: 1,
             minValues: {$min: value},
             avgValue: {$avg: value},
             maxValue: {$max: value},
          }}
      );
      

      code is not tested

      这篇关于哪些NoSQL解决方案适用于简单的,基于时间的数据点值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆