有效存储7.300.000.000行 [英] Efficiently storing 7.300.000.000 rows

查看:115
本文介绍了有效存储7.300.000.000行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您将如何处理以下存储和检索问题?



每天(365天/年)添加大约2.000.000行,每行包含以下信息row:




  • id(唯一的行标识符)

  • entity_id(取值介于1和2000之间.000包含)

  • date_id(每天递增1次 - 取值介于1和3.650之间(十年:1 * 365 * 10))

  • value_1(介于1和1.000.000之间的值)

  • value_2(介于1和1,000,000之间的值)



    • entity_id与date_id组合是唯一的。因此,每个实体和日期最多只能有一行添加到表中。数据库必须能够保存10年的每日数据(7.300.000.000行(3.650 * 2.000.000))。



      上面描述的是写模式。读取模式很简单:所有查询将在特定的entity_id上进行。也就是说检索描述为entity_id = 12345的所有行。



      不需要事务支持,但存储解决方案必须是开源的。理想情况下,我想使用MySQL,但我打算提出建议。



      现在 - 你将如何解决所描述的问题?



      更新:我被要求详细说明读写模式。对表的写入将每天一次完成,其中新的2M条目将一次性添加。

      解决方案

      使用 partition 。使用你的读模式,你想要分区 entity_id hash。


      How would you tackle the following storage and retrieval problem?

      Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row:

      • id (unique row identifier)
      • entity_id (takes on values between 1 and 2.000.000 inclusive)
      • date_id (incremented with one each day - will take on values between 1 and 3.650 (ten years: 1*365*10))
      • value_1 (takes on values between 1 and 1.000.000 inclusive)
      • value_2 (takes on values between 1 and 1.000.000 inclusive)

      entity_id combined with date_id is unique. Hence, at most one row per entity and date can be added to the table. The database must be able to hold 10 years worth of daily data (7.300.000.000 rows (3.650*2.000.000)).

      What is described above is the write patterns. The read pattern is simple: all queries will be made on a specific entity_id. I.e. retrieve all rows describing entity_id = 12345.

      Transactional support is not needed, but the storage solution must be open-sourced. Ideally I'd like to use MySQL, but I'm open for suggestions.

      Now - how would you tackle the described problem?

      Update: I was asked to elaborate regarding the read and write patterns. Writes to the table will be done in one batch per day where the new 2M entries will be added in one go. Reads will be done continuously with one read every second.

      解决方案

      Use partitioning. With your read pattern you'd want to partition by entity_id hash.

      这篇关于有效存储7.300.000.000行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆