为历史数据正确设计 EAV 数据库 [英] Designing an EAV database correctly for historical data

查看:16
本文介绍了为历史数据正确设计 EAV 数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读 EAV 数据库 和大多数缺点似乎是与非常非常糟糕的 EAV 设计或困难相关从数据.

I have been reading about EAV database and most of the short comings seem to be related to really, really, bad EAV designs or difficulty generating reports from the data.

通常当您看到人们抱怨 EAV 时,他们使用少于三个表来尝试复制 RDBMS 中单独表 + 列的功能.有时这意味着将从小数到字符串的所有内容存储在单个 TEXT 值列中.EAV 还会破坏数据完整性的保护措施,如果您不小心,这可能会非常糟糕.

Usually when you see people complaining about EAV they are using less than three tables to try to replicate the functionally of separate tables + columns in a RDBMS. Sometimes that means storing everything from decimals to strings in a single TEXT value column. EAV also messes with the safe-guards over data integrity which can be very bad if you are not careful.

不过,EAV 确实提供了一种简单的方法来跟踪历史数据,并允许我们在 SQL 和键值存储系统之间来回移动系统的各个部分.

However, EAV does provide an easy way to track historical data and allows us to move parts of the system back and forth between SQL and key-value store systems.

如果我们根据类型区分不同的实体属性会怎样.这将使我们仍然可以处理除了与特定属性和实体相关联的正确索引值之外,还有 belongsTo、Has、HasMany 和 HasManyThrough 关系.

What if we separate different entity attributes based on their type. This would allow us to still handle belongsTo, Has, HasMany, and HasManyThrough relations in addition to properly indexed values tied to specific attributes and entities.

考虑以下两个基础实体

products (price -> decimal, title -> string, desc -> text, etc...)
    attributes
        options
            [...]
        int
        datetime
        string
        text
        decimal
        relation
            [id,foreign_key]

users (gender -> options, age -> int, username -> string, etc...)
    attributes
        options
            [...]
        int
        datetime
        string
        text
        decimal
        relation
            [id,foreign_key]

RDBMS 架构设计

众所周知,用户个人资料和产品是世界上最多样化的项目之一.每家公司对它们的处理方式不同,并且针对它们的需求有不同的列"或属性".

RDBMS Schema Design

As we all know, users profiles and products are some of the most diverse items in the world. Each company handles them differently and has different "columns" or "attributes" for their needs.

以下是如何处理多个(嵌套和/或关系)实体的视图.

The following is a view of how to handle multiple (nested and/or relational) entities.

这个想法是每个实体都有这个主属性表,然后指定如何查找和解释这些值.这使我们能够处理特殊情况,例如其他实体的外键以及选项"或十进制数字等.

The idea is that for each entity has this master attribute table that then specifies how to find and interpret those values. This allows us to handle special cases like foreign keys to other entities and things like "options" or decimal numbers.

实体类型 {ID,type,//即博客"、用户"、产品"等.created_at}

entity_type { id, type, // i.e. "blog", "user", "product", etc.. created_at }

entity {
    id,
    entity_type_id, 
    created_at
}

    attr {
        id,
        entity_id,
        type,
        name,
        created_at
    }

        option {
            id,
            attr_id,
            entity_id,
            multiple, // multiple values allowed?
            name,
            created_at
        }

        attr_option {
            id
            attr_id,
            entity_id,
            option_id
            option,
            created_at
        }

        attr_int {
            attr_id,
            entity_id,
            int,
            created_at
        }

        attr_relation {
            attr_id,
            entity_id,
            entity_fk_id,
            created_at
        }

        attr_datetime {
            attr_id,
            entity_id,
            datetime,
            created_at
        }

        attr_string {
            attr_id,
            entity_id,
            var_char,
            created_at
        }

        attr_text {
            attr_id,
            entity_id,
            text,
            created_at
        }

        attr_decimal {
            attr_id,
            entity_id,
            decimal,
            created_at
        }

这样的表将允许我们永远不必UPDATE ...,因为我们可以为每个更改值的新属性INSERT INTO ...添加created_at 以了解最新的值是什么.这非常适合保存历史数据记录(当然也可以例外).

A table like this would allow us to never have to UPDATE ... since we could just INSERT INTO ... for each new attribute that changes value and add the created_at to know what the most recent value is. This is perfect for keeping records of historical data (Exceptions could still be made of course).

首先,它是什么类型"的实体?(用户、帖子、评论等)

First, what "type" of entity is it? (user, post, comment, etc..)

SELECT * FROM entity_type et LEFT JOIN entity e ON e.entity_type_id = et.id WHERE e.id = ?

接下来,这个实体的属性是什么?(表属性)

Next, what are the attributes of this entity? (TABLE attr)

SELECT * FROM attr WHERE entity_id = ?

接下来,该实体的属性中存在哪些值?(attr_### 表)

Next, what values exist in the attributes for this entity? (attr_### tables)

SELECT * FROM attr_option, attr_int, attr_relation, attr_text, ... WHERE entity_id = ?
vs
SELECT * FROM attr_option WHERE entity_id = ? if( ! multiple) ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_int WHERE entity_id = ? ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_relation WHERE entity_id = ? ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_text WHERE entity_id = ? ORDER BY created_at DESC LIMIT 1
...

此实体存在哪些关系?

假设我们有一个 ID 为 34 的帖子"实体,并且我们想要它的评论"(entity_type = 2),这可以让我们获取产品实体上的评论实体 ID:

Assuming we have a "post" entity with an ID of 34 and we want the "comments" for it (entity_type = 2), this could allow us to fetch comment entity ids on a product entity:

SELECT * FROM entity AS e
LEFT JOIN attr_relation AS ar ON ar.entity_id = e.id
WHERE ar.entity_id = 34 AND e.entity_type = 2;

除了多个查询(无论如何都需要键值存储),这种方法会存在什么问题?

Apart from multiple queries, (which are needed with key-value stores anyway), what problems would exist with this approach?

推荐答案

一个 EAV '数据库' [原文如此] 字面意思 数学上 直截了当数据库及其元数据的三元组中未记录的描述,没有将关系列表化、查询关系、查询元数据、类型检查、保持完整性、优化、自动处理或控制并发的功能.

An EAV 'database' [sic] is literally mathematically straightforwardly an undocumented description in triples of a database and its metadata, with no functionality to tablulate relationships, or query relationships, or query metadata, or type check, or maintain integrity, or optimize, or transact atomically, or control concurrency.

软件工程原则规定,健全的 EAV 数据库 [原文如此] 的使用完全包括定义适当的抽象(类型、运算符、进程、解释器、模块)来重构 DBMS 的功能.

Software engineering principles dictate that sound EAV database [sic] use consist entirely of defining appropriate abstractions (types, operators, processes, interpreters, modules) reconstructing functionality of a DBMS.

从一个人的 EAV 三元组及其含义到(碎片)数据库描述的映射的机械性质使得这很容易显示.

The mechanical nature of the mapping from one's EAV triples and their meanings to a (fragmented) database description makes this easy to show.

套用 Greenspun,任何足够复杂的 EAV 项目都包含一个临时的、非正式的- 半个 DBMS 的指定的、充满错误的、缓慢的实现.

To paraphrase Greenspun, any sufficiently complex EAV project contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a DBMS.

我再说一遍:EAV 是数据库及其元数据的三元组中的未记录描述,没有 DBMS.仅将 EAV 用于已证明 DDL 解决方案无法满足性能要求并且 EAV 解决方案可以而且值得的部分数据库.

I repeat: EAV is an undocumented description in triples of a database and its metadata, with no DBMS. Use EAV only for parts of a database where you have demonstrated that a DDL solution cannot meet performance requirements and that an EAV solution can and is worth it.

这篇关于为历史数据正确设计 EAV 数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆