为历史数据正确设计EAV数据库 [英] Designing an EAV database correctly for historical data

查看:211
本文介绍了为历史数据正确设计EAV数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介



我一直在阅读和大多数的short comings似乎是真正的,真正的,坏的EAV设计或难度<$ href =https:/ / /stackoverflow.com/questions/2523741/how-to-overcome-shortcomings-in-reporting-from-eav-database\">从数据生成报告

通常当你看到有人抱怨EAV时,他们使用少于三个表来试图在RDBMS中复制单独的表+列的功能。有时这意味着将一切从小数到字符串存储在单个 TEXT 值列中。 EAV也和保安人员混淆了数据的完整性,如果你不小心的话,这可能是非常糟糕的。然而,EAV提供了一个简单的方法来跟踪历史数据,允许我们在SQL和键值存储系统之间来回移动系统的一部分。

如果我们根据不同的类型区分不同的实体属性,这将允许我们仍然处理
belongsTo,Has,HasMany和HasManyThrough关系,以及与特定属性和实体绑定的正确索引值。

考虑到以下两个基本实体

 产品(价格 - >小数,标题 - >字符串,desc  - >文字等)。 ..)
属性
选项
[...]
int
日期时间
字符串
文本
十进制
关系
[id,foreign_key]

用户(性别 - >选项,年龄 - > int,用户名 - >字符串等)
属性
选项
[...]
int
日期时间
字符串
文本
十进制
关联
[id, foreign_key]



RDBMS架构设计



我们都知道,用户配置文件和产品是一些最大的世界上不同的物品。每个公司都以不同的方式处理它们,并为其需求提供不同的列或属性。

以下是关于如何处理多个(嵌套和/或关系)实体。

这个想法是,每个实体都有这个主属性表,然后指定如何查找和解释这些值。这允许我们处理像其他实体的外键和选项或十进制数字之类的特殊情况。类型,//即博客,用户,产品等。
created_at
}

  










$ b $
类型,
名称,
created_at
}

选项{
id,
attr_id,
entity_id,
multiple,//允许多个值?
名称,
created_at
}

attr_option {
id
attr_id,
entity_id,
option_id
选项,
created_at
}

attr_int {
attr_id,
entity_id,
int,
created_at


attr_relation {
attr_id,
entity_id,
entity_fk_id,
created_at
}

attr_datetime {
attr_id,
entity_id,
datetime,
created_at
}

attr_string {
attr_id,
entity_id,
var_char,
created_at
}

attr_text {
attr_id,
entity_id,
te xt,
created_at
}

attr_decimal {
attr_id,
entity_id,
decimal,
created_at
}

像这样的表格可以让我们永远不需要 UPDATE ... ,因为我们可以为 INSERT INTO ... 更改值并添加 created_at 知道最近的值是什么。这是保存历史数据记录的完美之处(当然,仍然可以进行异常处理)。

示例查询



首先,实体是什么类型? (user,post,comment等)。
$ b $ pre $ SELECT * FROM entity_type et LEFT JOIN entity e ON e.entity_type_id = et .id WHERE e.id =?

接下来,这个实体的属性是什么? (TABLE attr)

  SELECT * FROM attr WHERE entity_id =? 

接下来,这个实体的属性有什么值? (attr _ ### tables)

$ $ $ $ $ $ $ $ $ $ $ $ $ p $
vs
SELECT * FROM attr_option WHERE entity_id =? if(!multiple)ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_int WHERE entity_id =? ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_relation WHERE entity_id =? ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_text WHERE entity_id =? ORDER BY created_at DESC LIMIT 1
...

这个实体有什么关系? / p>

假设我们有一个ID为34的post实体,并且我们想要它的注释(entity_type = 2),这可以让我们获取评论实体在一个产品实体上的id:

pre code $ SELECT $ FROM实体AS e
LEFT JOIN attr_relation AS ar ON ar.entity_id = e.id
WHERE ar.entity_id = 34 AND e.entity_type = 2;

除了多重查询(无论如何都需要使用键值存储),这个方法会出现问题吗? 在数据库及其元数据的三元组中没有记录的描述,没有用于列出关系或查询关系或查询元数据或类型检查的功能,或保持完整性,或优化或原子交易,或控制并发性。

软件工程原理规定,声音EAV数据库[sic]使用完全包括定义适当的抽象,运算符,进程,解释器,模块)重构DBMS的功能。

从一个EAV三元组及其含义到一个(分段)数据库描述使这个很容易显示。



要解释 Greenspun ,任何足够复杂的EAV项目包含一个特殊的,非正式指定的,错误缠身的,慢一半的数据库管理系统的实施。



我再说一遍:EAV是一个数据库三元组元数据,没有DBMS。使用EAV仅适用于数据库的某些部分,您已经证明DDL解决方案无法满足性能要求,并且EAV解决方案可以并且值得。


Intro

I have been reading about EAV database and most of the short comings seem to be related to really, really, bad EAV designs or difficulty generating reports from the data.

Usually when you see people complaining about EAV they are using less than three tables to try to replicate the functionally of separate tables + columns in a RDBMS. Sometimes that means storing everything from decimals to strings in a single TEXT value column. EAV also messes with the safe-guards over data integrity which can be very bad if you are not careful.

However, EAV does provide an easy way to track historical data and allows us to move parts of the system back and forth between SQL and key-value store systems.

What if we separate different entity attributes based on their type. This would allow us to still handle belongsTo, Has, HasMany, and HasManyThrough relations in addition to properly indexed values tied to specific attributes and entities.

Considering the following two base entities

products (price -> decimal, title -> string, desc -> text, etc...)
    attributes
        options
            [...]
        int
        datetime
        string
        text
        decimal
        relation
            [id,foreign_key]

users (gender -> options, age -> int, username -> string, etc...)
    attributes
        options
            [...]
        int
        datetime
        string
        text
        decimal
        relation
            [id,foreign_key]

RDBMS Schema Design

As we all know, users profiles and products are some of the most diverse items in the world. Each company handles them differently and has different "columns" or "attributes" for their needs.

The following is a view of how to handle multiple (nested and/or relational) entities.

The idea is that for each entity has this master attribute table that then specifies how to find and interpret those values. This allows us to handle special cases like foreign keys to other entities and things like "options" or decimal numbers.

entity_type { id, type, // i.e. "blog", "user", "product", etc.. created_at }

entity {
    id,
    entity_type_id, 
    created_at
}

    attr {
        id,
        entity_id,
        type,
        name,
        created_at
    }

        option {
            id,
            attr_id,
            entity_id,
            multiple, // multiple values allowed?
            name,
            created_at
        }

        attr_option {
            id
            attr_id,
            entity_id,
            option_id
            option,
            created_at
        }

        attr_int {
            attr_id,
            entity_id,
            int,
            created_at
        }

        attr_relation {
            attr_id,
            entity_id,
            entity_fk_id,
            created_at
        }

        attr_datetime {
            attr_id,
            entity_id,
            datetime,
            created_at
        }

        attr_string {
            attr_id,
            entity_id,
            var_char,
            created_at
        }

        attr_text {
            attr_id,
            entity_id,
            text,
            created_at
        }

        attr_decimal {
            attr_id,
            entity_id,
            decimal,
            created_at
        }

A table like this would allow us to never have to UPDATE ... since we could just INSERT INTO ... for each new attribute that changes value and add the created_at to know what the most recent value is. This is perfect for keeping records of historical data (Exceptions could still be made of course).

Sample queries

First, what "type" of entity is it? (user, post, comment, etc..)

SELECT * FROM entity_type et LEFT JOIN entity e ON e.entity_type_id = et.id WHERE e.id = ?

Next, what are the attributes of this entity? (TABLE attr)

SELECT * FROM attr WHERE entity_id = ?

Next, what values exist in the attributes for this entity? (attr_### tables)

SELECT * FROM attr_option, attr_int, attr_relation, attr_text, ... WHERE entity_id = ?
vs
SELECT * FROM attr_option WHERE entity_id = ? if( ! multiple) ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_int WHERE entity_id = ? ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_relation WHERE entity_id = ? ORDER BY created_at DESC LIMIT 1
SELECT * FROM attr_text WHERE entity_id = ? ORDER BY created_at DESC LIMIT 1
...

What relations exist for this entity?

Assuming we have a "post" entity with an ID of 34 and we want the "comments" for it (entity_type = 2), this could allow us to fetch comment entity ids on a product entity:

SELECT * FROM entity AS e
LEFT JOIN attr_relation AS ar ON ar.entity_id = e.id
WHERE ar.entity_id = 34 AND e.entity_type = 2;

Apart from multiple queries, (which are needed with key-value stores anyway), what problems would exist with this approach?

解决方案

An EAV 'database' [sic] is literally mathematically straightforwardly an undocumented description in triples of a database and its metadata, with no functionality to tablulate relationships, or query relationships, or query metadata, or type check, or maintain integrity, or optimize, or transact atomically, or control concurrency.

Software engineering principles dictate that sound EAV database [sic] use consist entirely of defining appropriate abstractions (types, operators, processes, interpreters, modules) reconstructing functionality of a DBMS.

The mechanical nature of the mapping from one's EAV triples and their meanings to a (fragmented) database description makes this easy to show.

To paraphrase Greenspun, any sufficiently complex EAV project contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a DBMS.

I repeat: EAV is an undocumented description in triples of a database and its metadata, with no DBMS. Use EAV only for parts of a database where you have demonstrated that a DDL solution cannot meet performance requirements and that an EAV solution can and is worth it.

这篇关于为历史数据正确设计EAV数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆