数据库设计问题 [英] Database design question

查看:172
本文介绍了数据库设计问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以原始格式(csv和二进制)累积了大量的数据 - 每天4GB,几个月是准确的。



我决定加入文明世界和使用数据库来访问数据,我想知道什么是正确的布局;格式很简单:每个时间刻度(出价,要价,时间戳等)x几行x高达0.5万/天×数百种金融工具x monthes数据。



有一个MySQL服务器与MYISAM(我理解这将是这种类型的使用的正确的引擎)运行在商用harware(2 x 1GB RAID 0 SATA,内核2 @ 2.7GHz)



数据库的正确布局是什么?表/索引应该如何显示?这种情况的一般建议是什么?



编辑:我的常见用法是简单的查询,以提取特定日期和工具的时间序列信息,例如

  SELECT(ask + bid)/ 2 
WHERE instrument ='GOOG'
AND date = '01 -06-2008'
ORDER BY timeStamp;

编辑:我试图将所有的数据填充在一个由timeStamp索引的表中, -

解决方案

你真的不说你的背景是什么,以及如何您知道有关编程和数据库设计。听起来你应该做一些阅读。在概念上,虽然你的设计是相当简单。您的说明只标识了两个实体:




  • 金融工具;




  • 金融工具:




    • 安全代码;




    报价:




    • 时间戳;

    • 金融工具;


    • 询价。



    对金融工具的引用是所谓的外键。每个表格还需要主键,可能只是自动增量字段。



    概念上相当简单。

      CREATE TABLE instrument b $ b id BIGINT NOT NULL AUTO_INCREMENT,
    code CHAR(4),
    company_name VARCHAR(100),
    PRIMARY KEY(id)
    );

    CREATE TABLE quote(
    id BIGINT NOT NULL AUTO_INCREMENT,
    intrument_id BIGINT NOT NULL,
    dt DATETIME NOT NULL,
    bid NUMERIC ),
    询问NUMERIC(8,3),
    PRIMARY KEY(id)


    CREATE INDEX instrument_idx1 ON instrument(code);

    CREATE INDEX quote_idx1 ON quote(instrument_id,dt);

    SELECT(bid + ask)/ 2
    FROM instrument i
    JOIN quote q ON i.id = q.instrument_id
    WHERE i.code ='GOOG'
    AND q.dt> = '01 -06-2008'AND q.dt< '02 -06-2008'

    如果您的数据集足够大,您可能需要包括请求)/ 2在表中,所以你不必动态计算。



    确定,这是标准化的视图。之后,您可能需要开始进行性能优化。请考虑有关在MySQL中存储数十亿行的问题。分区是MySQL 5.1+(相当新的)的一个特性。



    但是另一个问题是,你需要存储所有这些数据吗?我问这是因为我以前在网上经纪工作,我们只存储所有的交易在一个非常有限的窗口,交易将是一个较小的数据比报价,你似乎想要的。



    存储数十亿行数据是一个严重的问题,你真的需要认真的帮助来解决。


    I accumulated a quite a lot of data in a raw form (csv and binary) - 4GB per day for a few months to be precise.

    I decided to join the civilized world and use database to access the data and I wondered what would be the correct layout; the format is quite simple: a few rows for every time tick (bid, ask, timestamp, etc.) x up to 0.5Million/day x hundreds of financial instruments x monthes of data.

    There is a MySQL server with MYISAM (which I understood would be the correct engine for this type of usage) running on commodity harware (2 x 1GB RAID 0 SATA, core 2 @ 2.7GHz)

    What would be correct layout of the database? How should the tables/indices look like? What are the general recommendations with this scenario? What would you predict set me pitfalls along the way?

    Edit: my common usage will be simple queries to extract time series information for a specific date and instruments, e.g.

    SELECT (ask + bid) / 2
      WHERE instrument='GOOG'
      AND date = '01-06-2008'
      ORDER BY timeStamp;
    

    Edit: I tried to stuff all my data in one table indexed by the timeStamp but it was way too slow - therefore I reckoned it would take a more elaborate scheme.

    解决方案

    You don't really say what your background is and how much you know about programming and database design. It sounds like you should do some reading. Conceptually though your design is fairly simple. Your description identifies a mere two entities:

    • Financial instrument; and
    • Quote.

    So you need to then identify the attributes.

    Financial instrument:

    • Security code;
    • Market;
    • etc.

    Quote:

    • Timestamp;
    • Financial instrument;
    • Bid price; and
    • Ask price.

    The reference to the financial instrument is what's called a foreign key. Each table also needs a primary key, probably just an auto-increment field.

    Conceptually fairly simple.

    CREATE TABLE instrument (
      id BIGINT NOT NULL AUTO_INCREMENT,
      code CHAR(4),
      company_name VARCHAR(100),
      PRIMARY KEY (id)
    );
    
    CREATE TABLE quote (
      id BIGINT NOT NULL AUTO_INCREMENT,
      intrument_id BIGINT NOT NULL,
      dt DATETIME NOT NULL,
      bid NUMERIC(8,3),
      ask NUMERIC(8,3),
      PRIMARY KEY (id)
    )
    
    CREATE INDEX instrument_idx1 ON instrument (code);
    
    CREATE INDEX quote_idx1 ON quote (instrument_id, dt);
    
    SELECT (bid + ask) / 2
    FROM instrument i
    JOIN quote q ON i.id = q.instrument_id
    WHERE i.code = 'GOOG'
    AND q.dt >= '01-06-2008' AND q.dt < '02-06-2008'
    

    If your dataset is sufficiently large you might want to include (bid + ask) / 2 in the table so you don't have to calculate on the fly.

    Ok, so that's the normalized view. After this you may need to start making performance optimizations. Consider this question about storing billions of rows in MySQL. Partitioning is a feature of MySQL 5.1+ (fairly new).

    But another question to ask yourself is this: do you need to store all this data? The reason I ask this is that I used to be working in online broking and we only stored all the trades for a very limited window and trades would be a smaller set of data than quotes, which you seem to want.

    Storing billions of rows of data is a serious problem and one you really need serious help to solve.

    这篇关于数据库设计问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆