存储趋势数据的最佳方法是什么? [英] What is the best way of storing trend data?

查看:140
本文介绍了存储趋势数据的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在构建一个应用程序,在该应用程序中我将导入(当前)大约15,000种产品的统计数据.目前,如果我要维护来自一个来源的每日统计信息的数据库表,则每天将增加15,000行数据(假设每行5-10个字段主要是float,int).显然,一张桌子每年等于超过500万条记录.

I am currently building an application where I am importing statistical data for (currently) around 15,000 products. At current, if I was to maintain one database table for each day statistics from one source it would be increased by 15,000 rows of data (let's say 5-10 fields per row primarily float, int) per day. Obviously equating to over 5 million records per year into one table.

与从其他来源引入数据(从而使每个新来源的数据库大小增加500万条记录)的想法无关紧要.

That doesn't concern me so much as the thought of bringing in data from other sources (and thus increasing the size the database by 5 million records for each new source).

现在,数据是基于统计/趋势的数据,基本上每天每条记录有1次写入,并且有很多读取.但是出于即时报告和绘图的目的,我需要根据规则(日期范围,值范围等)快速访问数据子集.

Now the data is statistical / trending based data, and will have basically 1 write per day per record, and many reads. For purposes of on the fly reporting and graphing however I need fast access to subsets of the data based on rules (date ranges, value ranges, etc).

我的问题是,这是存储数据(MySQL InnoDb表)的最佳方法,还是有一种更好的存储和处理统计/趋势数据的方法?

What my question is, is this the best way to store the data (MySQL InnoDb tables), or is there a better way to store and handle statistical/trend data?

我在这一点上遇到的其他选择: 1.多个数据库(每个产品一个),其中每个数据源都有单独的表. (即数据库:ProductA,表:Source_A,Source_B,Source_C) 2.一个数据库,多个表(每个产品/数据源一个) (即数据库:产品,表:ProductA_SourceA,ProductA_SourceB等) 3.数据库中的所有factual或特定产品信息,以及csv,xml,json和(平面文件)中的所有statistical数据都位于单独的目录中.

Other options I have tossed around at this point: 1. Multiple databases (one per product), with separate tables for each data source within. (ie Database: ProductA, Table(s):Source_A, Source_B, Source_C) 2. One database, multiple tables (one for each product/data source) (ie Database: Products, Table(s): ProductA_SourceA, ProductA_SourceB, etc.) 3. All factual or specific product information in the database and all statistical data in csv, xml, json, (flat files) in separate directories.

到目前为止,这些选项中没有一个是非常易于管理的,每个都有其优缺点.在进入开发的Alpha阶段之前,我需要一个合理的解决方案.

So far, none of these options are very manageable, each has its pros and cons. I need a reasonable solution before I move into the alpha stage of development.

推荐答案

您可以尝试使用基于列的数据库.这些类型的数据库在您描述的那种类型的分析查询上要好得多.有几种选择:

You could try making use of a column based database. These kinds of databases are much better at analytical queries of the kind you're describing. There are several options:

http://en.wikipedia.org/wiki/面向列的DBMS

我们在InfiniDB方面拥有丰富的经验:

We've had good experience with InfiniDB:

http://infinidb.org/

Infobright也看起来不错:

and Infobright looks good as well:

http://www.infobright.com/

InfiniDB和Infobright都有免费的开源社区版本,因此,我建议使用这些版本来获得一些有关可能获得的性能优势的基准.

Both InfiniDB and Infobright have free open source community editions, so I would recommend using these to get some benchmarks on the kinds of performance benefit you might get.

您可能还希望查看数据分区以提高性能.

You might also want to look at partitioning your data to improve performance.

这篇关于存储趋势数据的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆