数据库 - 设计“事件”表 [英] Database - Designing an "Events" Table

查看:210
本文介绍了数据库 - 设计“事件”表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读这个很棒的 Nettuts +文章 我提出了一个表模式,将高度易失性数据与其他需要重读的表分开,同时降低整个数据库模式所需的表数,但是我不确定这是否是一个好主意,因为它不遵循规范化的规则,我想听听你的建议,这里是一般的想法:






我有四种类型的用户在类表继承结构,在主用户表中存储所有用户共享的数据( id username password ,多个标志,...)以及一些 TIMESTAMP 字段( date_created date_updated date_activated date_lastLogin ,...)。



引用上述Nettuts +文章的提示# p>


示例2 :表格中有一个last_login
字段。它更新每个
用户登录到网站的时间。
但是表上的每个更新都会导致该表的
查询缓存被
刷新。您可以将该字段放入
另一个表中,以将您的
用户表更新到最小。


现在它变得更加棘手,我需要跟踪一些用户统计信息,例如




  • 多少独特已查看用户个人资料

  • 点击了特定类型用户广告

  • 发现的特定类型的用户
  • 等等...



在我完全规范化的数据库中,这会增加约8到10个附加表,它不是很多,但我想保持简单,如果我可以,所以我想出了以下 events 表:

  | ------ | ---------------- | ------ ---------- | --------------------- | ----------- | 
| ID | TABLE | EVENT | DATE | IP |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 1 |用户|登录| 2010-04-19 00:30:00 | 127.0.0.1 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 1 |用户|登录| 2010-04-19 02:30:00 | 127.0.0.1 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 2 |用户|创建| 2010-04-19 00:31:00 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 2 |用户|激活| 2010-04-19 02:34:00 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 2 |用户|批准| 2010-04-19 09:30:00 | 217.0.0.1 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 2 |用户|登录| 2010-04-19 12:00:00 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |创建| 2010-04-19 12:30:00 | 127.0.0.1 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |印象深刻2010-04-19 12:31:00 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |点击| 2010-04-19 12:31:01 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |点击| 2010-04-19 12:31:02 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |点击| 2010-04-19 12:31:03 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |点击| 2010-04-19 12:31:04 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 15 | user_ads |点击| 2010-04-19 12:31:05 | 127.0.0.2 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 2 |用户|阻塞| 2010-04-20 03:19:00 | 217.0.0.1 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |
| 2 |用户|删除| 2010-04-20 03:20:00 | 217.0.0.1 |
| ------ | ---------------- | ---------------- | ---- ----------------- | ----------- |

基本上 ID id )字段中,我相信其余的应该很简单。我在这个设计中喜欢的一件事是,我可以跟踪所有的用户登录,而不是只有最后一个,因此生成一些有趣的指标数据。



由于事件表的性质日益增长,我还考虑进行一些优化,例如:




  • #9 :由于只有有限数量的表和有限(预定)数量的事件,因此 TABLE EVENTS 列可以设置为 ENUM s而不是 VARCHAR s以节省一些空间。

  • #14 :存储 IP c> / code>。

  • DATE 存储为 TIMESTAMP s 而不是 DATETIME

  • 使用 ARCHIVE CSV )引擎而不是 InnoDB / MyISAM


    • 只有 INSERT s和 SELECT



    总的来说,每个事件



    优点:




    • 可存储更详细的数据(例如登录)。

    • 无需设计(代码) (日期和统计信息)。

    • 减少每个表格的几个列,并保持易失性数据分隔。



    < h2>缺点:


    • 非关系型(仍然不如EAV差):


      • SELECT * FROM events WHERE id = 2 AND table ='user'ORDER BY date DESC();

      • < ul
      • 每个事件6个字节的开销( ID TABLE code> EVENT )。






      更倾向于使用这种方法,因为职业选手似乎远远超过利弊,但我还是有点不情愿... 我错过了什么?





      @coolgeek:





      entity_type表,并使用其ID在
      中的object_type列(在您的情况下,
      'TABLE'列)。你想要
      和event_type
      表做同样的事情。


      意味着我应该添加一个附加的表,映射表中允许哪些事件,并使用事件表中的该表的PK,而不是 TABLE / EVENT pair?






      @ben:


      b $ b


      这些都是从
      现有数据派生的统计数据,不是吗?


      这些附加表主要与统计数据相关,但是我的数据不存在,例如:

        user_ad_stats user_post_stats 
      ------------- ---------------
      user_ad_id(FK)user_post_id(FK)
      ip ip
      date date
      type(impressed,clicked)

      表格我无法跟踪谁,什么或什么时候,不知道视图在这里可以如何帮助。


      我同意应该是单独的,
      ,但更多是因为它从根本上是b b不同的数据。什么是什么人和
      什么人做的是两个不同的
      的东西。我不认为波动是如此重要的


      我听说过这两种方式,我找不到任何在MySQL手册中说明任一个是正确的。无论如何,我同意你,他们应该是分开的表,因为它们代表的数据类型(与常规方法相比更具描述性的额外好处)。


      我认为你错失了
      树的森林。



      你的表的谓词是
      用户ID从IP IP在时间DATE
      EVENTed到TABLE这似乎
      合理,但有问题。


      我的意思是不如EAV,所有记录都遵循线性结构,并且它们很容易查询,没有层次结构,所以所有查询都可以用一个简单的 SELECT



      关于你的第二个声明,我想你在这里理解错了; IP地址不一定与用户相关联。表格结构应该是这样:


      IP地址( IP )将
      表(<$ c $>)的PK( ID )中的
      EVENT c $ c> TABLE
      )( DATE )。


      例如,在我上面的例子的最后一行它应该读取IP 217.0.0.1(一些管理员),删除用户#2(其最后知道的IP是127.0.0.2)在2010-04-20 03:20:00。


      您仍然可以将用户活动
      加入用户,但您无法a
      外键约束。


      确实,这是我的主要关注。然而,我不完全确定这个设计可能会出错,不能与传统的关系设计错误。我可以发现一些警告,但只要应用程序搞砸与数据库知道它在做什么我想应该不会有任何问题。



      另一件事在这个论点是我将存储更多的事件,并且每个事件将比原始设计多一倍,使用 ARCHIVE 存储引擎是非常有意义的,唯一的是它不支持 FK s( UPDATE s或 DELETE

      解决方案

      我强烈推荐这种方法。由于您可能在OLTP和OLAP使用相同的数据库,因此您可以通过添加一些星星和雪花来获得显着的性能优势。



      我有一个社交网络应用程序目前在65张表。我维护一个表来跟踪对象(博客/帖子,论坛/线程,图库/相册/图像等)视图,另一个用于对象推荐,以及第三个表,用于总结十几个其他表中的插入/更新活动。 p>

      我做一点不同的是维护一个entity_type表,并在object_type列(在你的情况下,'TABLE'列)使用它的ID。



      澄清Alix - 是的,您为对象维护一个参考表,事件的参考表(这些将是您的维度表)。您的事实表将有以下字段:

        id 
      object_id
      event_id
      event_time
      ip_address


      After reading the tips from this great Nettuts+ article I've come up with a table schema that would separate highly volatile data from other tables subjected to heavy reads and at the same time lower the number of tables needed in the whole database schema, however I'm not sure if this is a good idea since it doesn't follow the rules of normalization and I would like to hear your advice, here is the general idea:


      I've four types of users modeled in a Class Table Inheritance structure, in the main "user" table I store data common to all the users (id, username, password, several flags, ...) along with some TIMESTAMP fields (date_created, date_updated, date_activated, date_lastLogin, ...).

      To quote the tip #16 from the Nettuts+ article mentioned above:

      Example 2: You have a "last_login" field in your table. It updates every time a user logs in to the website. But every update on a table causes the query cache for that table to be flushed. You can put that field into another table to keep updates to your users table to a minimum.

      Now it gets even trickier, I need to keep track of some user statistics like

      • how many unique times a user profile was seen
      • how many unique times a ad from a specific type of user was clicked
      • how many unique times a post from a specific type of user was seen
      • and so on...

      In my fully normalized database this adds up to about 8 to 10 additional tables, it's not a lot but I would like to keep things simple if I could, so I've come up with the following "events" table:

      |------|----------------|----------------|---------------------|-----------|
      | ID   | TABLE          | EVENT          | DATE                | IP        | 
      |------|----------------|----------------|---------------------|-----------|
      | 1    | user           | login          | 2010-04-19 00:30:00 | 127.0.0.1 |
      |------|----------------|----------------|---------------------|-----------|
      | 1    | user           | login          | 2010-04-19 02:30:00 | 127.0.0.1 |
      |------|----------------|----------------|---------------------|-----------|
      | 2    | user           | created        | 2010-04-19 00:31:00 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 2    | user           | activated      | 2010-04-19 02:34:00 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 2    | user           | approved       | 2010-04-19 09:30:00 | 217.0.0.1 |
      |------|----------------|----------------|---------------------|-----------|
      | 2    | user           | login          | 2010-04-19 12:00:00 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | created        | 2010-04-19 12:30:00 | 127.0.0.1 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | impressed      | 2010-04-19 12:31:00 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | clicked        | 2010-04-19 12:31:01 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | clicked        | 2010-04-19 12:31:02 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | clicked        | 2010-04-19 12:31:03 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | clicked        | 2010-04-19 12:31:04 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 15   | user_ads       | clicked        | 2010-04-19 12:31:05 | 127.0.0.2 |
      |------|----------------|----------------|---------------------|-----------|
      | 2    | user           | blocked        | 2010-04-20 03:19:00 | 217.0.0.1 |
      |------|----------------|----------------|---------------------|-----------|
      | 2    | user           | deleted        | 2010-04-20 03:20:00 | 217.0.0.1 |
      |------|----------------|----------------|---------------------|-----------|
      

      Basically the ID refers to the primary key (id) field in the TABLE table, I believe the rest should be pretty straightforward. One thing that I've come to like in this design is that I can keep track of all the user logins instead of just the last one, and thus generate some interesting metrics with that data.

      Due to the growing nature of the events table I also thought of making some optimizations, such as:

      • #9: Since there is only a finite number of tables and a finite (and predetermined) number of events, the TABLE and EVENTS columns could be setup as ENUMs instead of VARCHARs to save some space.
      • #14: Store IPs as UNSIGNED INTs with INET_ATON() instead of VARCHARs.
      • Store DATEs as TIMESTAMPs instead of DATETIMEs.
      • Use the ARCHIVE (or the CSV?) engine instead of InnoDB / MyISAM.
        • Only INSERTs and SELECTs are supported, and data is compressed on the fly.

      Overall, each event would only consume 14 (uncompressed) bytes which is okay for my traffic I guess.

      Pros:

      • Ability to store more detailed data (such as logins).
      • No need to design (and code for) almost a dozen additional tables (dates and statistics).
      • Reduces a few columns per table and keeps volatile data separated.

      Cons:

      • Non-relational (still not as bad as EAV):
        • SELECT * FROM events WHERE id = 2 AND table = 'user' ORDER BY date DESC();
      • 6 bytes overhead per event (ID, TABLE and EVENT).

      I'm more inclined to go with this approach since the pros seem to far outweigh the cons, but I'm still a little bit reluctant... Am I missing something? What are your thoughts on this?

      Thanks!


      @coolgeek:

      One thing that I do slightly differently is to maintain an entity_type table, and use its ID in the object_type column (in your case, the 'TABLE' column). You would want to do the same thing with an event_type table.

      Just to be clear, you mean I should add an additional table that maps which events are allowed in a table and use the PK of that table in the events table instead of having a TABLE / EVENT pair?


      @ben:

      These are all statistics derived from existing data, aren't they?

      The additional tables are mostly related to statistics but I the data doesn't already exists, some examples:

      user_ad_stats                          user_post_stats
      -------------                          ---------------
      user_ad_id (FK)                        user_post_id (FK)
      ip                                     ip
      date                                   date
      type (impressed, clicked)
      

      If I drop these tables I've no way to keep track of who, what or when, not sure how views can help here.

      I agree that it ought to be separate, but more because it's fundamentally different data. What someone is and what someone does are two different things. I don't think volatility is so important.

      I've heard it both ways and I couldn't find anything in the MySQL manual that states that either one is right. Anyway, I agree with you that they should be separated tables because they represent kinds of data (with the added benefit of being more descriptive than a regular approach).

      I think you're missing the forest for the trees, so to speak.

      The predicate for your table would be "User ID from IP IP at time DATE EVENTed to TABLE" which seems reasonable, but there are issues.

      What I meant for "not as bad as EAV" is that all records follow a linear structure and they are pretty easy to query, there is no hierarchical structure so all queries can be done with a simple SELECT.

      Regarding your second statement, I think you understood me wrong here; the IP address is not necessarily associated with the user. The table structure should read something like this:

      IP address (IP) did something (EVENT) to the PK (ID) of the table (TABLE) on date (DATE).

      For instance, in the last row of my example above it should read that IP 217.0.0.1 (some admin), deleted the user #2 (whose last known IP is 127.0.0.2) at 2010-04-20 03:20:00.

      You can still join, say, user events to users, but you can't implement a foreign key constraint.

      Indeed, that's my main concern. However I'm not totally sure what can go wrong with this design that couldn't go wrong with a traditional relational design. I can spot some caveats but as long as the app messing with the database knows what it is doing I guess there shouldn't be any problems.

      One other thing that counts in this argument is that I will be storing much more events, and each event will more than double compared to the original design, it makes perfect sense to use the ARCHIVE storage engine here, the only thing is it doesn't support FKs (neither UPDATEs or DELETEs).

      解决方案

      I highly recommend this approach. Since you're presumably using the same database for OLTP and OLAP, you can gain significant performance benefits by adding in some stars and snowflakes.

      I have a social networking app that is currently at 65 tables. I maintain a single table to track object (blog/post, forum/thread, gallery/album/image, etc) views, another for object recommends, and a third table to summarize insert/update activity in a dozen other tables.

      One thing that I do slightly differently is to maintain an entity_type table, and use its ID in the object_type column (in your case, the 'TABLE' column). You would want to do the same thing with an event_type table.

      Clarifying for Alix - Yes, you maintain a reference table for objects, and a reference table for events (these would be your dimension tables). Your fact table would have the following fields:

      id
      object_id
      event_id
      event_time
      ip_address
      

      这篇关于数据库 - 设计“事件”表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆