最好的方法来存储巨大的日志数据 [英] Best way to store huge log data

查看:165
本文介绍了最好的方法来存储巨大的日志数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个关于存储统计数据的最佳方法的建议。 Django上有一个项目,它有一个30 000个在线游戏的数据库(mysql)。

I need an advice on optimal approach to store statistical data. There is a project on Django, which has a database (mysql) of 30 000 online games.

每个游戏都有三个统计参数:

Each game has three statistical parameters:


  • li>
  • 播放次数,

  • 喜欢次数

现在我需要每天存储这三个参数的历史数据,所以我正在考虑创建一个单一的数据库,它将有五列:

Now I need to store historical data for these three parameters on a daily basis, so I was thinking on creating a single database which will has five columns:

gameid, number of views, plays, likes, date (day-month-year data). 

所以最后,每一个游戏的每一天都会记录在一行,此表将有30000行,在10天内它的大小为300000,在一年中它的大小为10 950 000行。我不是DBA的东西的一个大专家,但这说明了,这很快就会成为一个性能问题。我不会说5年后会发生什么。
此表中收集的数据适用于简单图表

So in the end, every day for every game will be logged in one row, so in one day this table will have 30000 rows, in 10 days it will have size of 300000 and in a year it will have size of 10 950 000 rows. I'm not a big specialist in DBA stuff, but this says me, that this quickly will become a performance problem. I'm not talking what will happen in 5 years time. The data collected in this table is needed for simple graphs

(daily, weekly, monthly, custom range).

也许你有更好的想法如何存储这些数据?也许noSQL会更适合这种情况吗?

Maybe you have better ideas on how to store this data? Maybe noSQL will be more suitable in this case? Really need your advice on this.d

推荐答案

在postgresql中分区对大日志非常有用。首先创建父表:

Partitioning in postgresql works great for big logs. First create the parent table:

create table  game_history_log (
    gameid integer,
    views integer,
    plays integer,
    likes integer,
    log_date date
);

现在创建分区。在这种情况下,每月一个,900 k行,将是好的:

Now create the partitions. In this case one for each month, 900 k rows, would be good:

create table game_history_log_201210 (
    check (log_date between '2012-10-01' and '2012-10-31')
) inherits (game_history_log);

create table game_history_log_201211 (
    check (log_date between '2012-11-01' and '2012-11-30')
) inherits (game_history_log);

请注意每个分区中的检查约束。如果你尝试插入错误的分区:

Notice the check constraints in each partition. If you try to insert in the wrong partition:

insert into game_history_log_201210 (
    gameid, views, plays, likes, log_date
) values (1, 2, 3, 4, '2012-09-30');
ERROR:  new row for relation "game_history_log_201210" violates check constraint "game_history_log_201210_log_date_check"
DETAIL:  Failing row contains (1, 2, 3, 4, 2012-09-30).

分区的一个优点是它将只搜索正确的分区,搜索大小,不管有多少年的数据。下面是搜索某个日期的说明:

One of the advantages of partitioning is that it will only search in the correct partition reducing drastically and consistently the search size regardless of how many years of data there is. Here the explain for the search for a certain date:

explain
select *
from game_history_log
where log_date = date '2012-10-02';
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..30.38 rows=9 width=20)
   ->  Append  (cost=0.00..30.38 rows=9 width=20)
         ->  Seq Scan on game_history_log  (cost=0.00..0.00 rows=1 width=20)
               Filter: (log_date = '2012-10-02'::date)
         ->  Seq Scan on game_history_log_201210 game_history_log  (cost=0.00..30.38 rows=8 width=20)
               Filter: (log_date = '2012-10-02'::date)

请注意,除了父表之外,它只扫描正确的分区。显然,您可以在分区上使用索引,以避免顺序扫描。

Notice that apart from the parent table it only scanned the correct partition. Obviously you can have indexes on the partitions to avoid a sequential scan.

继承 分区

这篇关于最好的方法来存储巨大的日志数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆