如何克服缺点的EAV数据库报告的? [英] How to overcome shortcomings in reporting from EAV database?

查看:429
本文介绍了如何克服缺点的EAV数据库报告的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SQL实体 - 属性 - 值的数据库设计的主要缺点似乎都涉及到能够查询和高效,快捷的数据报告。大部分时间我关于这个问题读取信息警惕实施EAV由于这些问题和查询的通用/对几乎所有应用报告。

我目前设计,其中为实体之一的字段不是在设计已知/编译时间以及由系统的终端用户定义一个系统。 EAV似乎是一个非常适合这一要求,但由于我读过有关的问题,我在实现它的犹豫不决也有该系统的一些pretty繁重的报告要求以及。我的想到的我已经想出了解决的办法,但想这个问题构成了对SO社区。

由于典型的规范化的数据库(OLTP)仍然并不总是运行报告最好的选择,一个很好的做法似乎有从何处归一化的数据库中的数据复制到报告数据库(OLAP),索引广泛,并有可能非规范化,便于查询。难道同样的想法被用来解决一个EAV设计的缺点?

主要缺点我看到的是从EAV数据库中的数据传输到报表,你可能最终不得不改变在报告数据库中的表作为新字段在EAV数据库中定义的复杂性增加了。但是,这是几乎不可能的,似乎是由在EAV设计给出的更大的灵活性的可接受的折衷。这个缺点也同样存在,如果我使用的主要数据存储的非SQL数据存储(即CouchDB的或类似的),因为所有的标准报告工具期待一个SQL后端对查询。

做EAV系统的问题大多消失,如果你有一个单独的报告数据库进行查询?

编辑:谢谢你到目前为止的意见。其中一件关于我的工作认为我真的只是说说使用EAV的实体,而不是一切都在系统的一个系统中重要的东西。

该系统的整个要点是能够从未提前已知多种不同来源的提取数据和紧缩的数据来与一个特定实体一些最佳已知数据。所以每一个场我负责的是多值,我也需要跟踪历史记录每个。这个归一化的设计最终成为每场1台这使得它查询样的痛苦呢。

下面是表模式和样本数据我看(很明显从我正在努力改变,但我认为这说明了一点好):

EAV表

 人员
-------------------
- 标识 - 姓名 -
-------------------
- 123 - 乔 - 史密斯 -
-------------------Person_Value
-------------------------------------------------- -----------------
- PERSONID - 源 - 场 - 值 - EFFECTIVEDATE -
-------------------------------------------------- -----------------
- 123 - CIA - 是homeAddress - 123樱桃LN - 2010-03-26 -
- 123 - DMV - 是homeAddress - 561斯托尼路 - 2010-02-15 -
- 123 - FBI - 是homeAddress - 676 Lancas博士 - 2010-03-01 -
-------------------------------------------------- -----------------

报告表

  Person_Denormalized
-------------------------------------------------- --------------------------------------
- 标识 - 名称 - 是homeAddress - HomeAddress_Confidence - HomeAddress_EffectiveDate -
-------------------------------------------------- --------------------------------------
- 123 - 乔 - 史密斯 - 123樱桃LN - 0.713 - 2010-03-26 -
-------------------------------------------------- --------------------------------------

标准化设计

 人员
-------------------
- 标识 - 姓名 -
-------------------
- 123 - 乔 - 史密斯 -
-------------------Person_HomeAddress
-------------------------------------------------- ----
- PERSONID - 源 - 值 - 生效日期 -
-------------------------------------------------- ----
- 123 - CIA - 123樱桃LN - 2010-03-26 -
- 123 - DMV - 561斯托尼路 - 2010-02-15 -
- 123 - FBI - 676 Lancas博士 - 2010-03-01 -
-------------------------------------------------- ----

在信心字段下面是使用逻辑生成不能轻易pssed(如果有的话)使用SQL所以我除了插入新的值最常见的操作将拉动对一个人的所有领域,因此所有数据前$ P $我可以生成报告表中的记录。这实际上是的更容易的在EAV模型,我可以做一个查询。在标准化的设计,我最终不得不做的每场1查询,以避免大规模的笛卡尔乘积从加入他们一起。


解决方案

简短的回答 - 是的,报告数据库是一个合理的方法来解决从EAV数据模型报告的问题。

我花了数年的这使得最终用户可以完全自由地定义自己的数据模型,用模式和使用模式EAV存储的数据既是一个信息管理解决方案的工作。有趣的是,这款产品提供了用于完成报告要求(如图表提供对象导航,视图进行投影等)元模式对象。这意味着终端用户可以自由定义使用,他们会用来建立在一审数据模型相同的术语和概念查询。报告的行为基本上是计算数据通过浏览这些定义设置,并且用手结果切换到传统的报告书写工具,好像它是关系数据。

这种方法的优势之一是,这是已经到位相同的机制来在EAV模型变换到一些用户可以与可重复使用并施加于报告功能工作。

The major shortcomings with Entity-Attribute-Value database designs in SQL all seem to be related to being able to query and report on the data efficiently and quickly. Most of the information I read on the subject warn against implementing EAV due to these problems and the commonality of querying/reporting for almost all applications.

I am currently designing a system where the fields for one of the entities are not known at design/compile time and are defined by the end-user of the system. EAV seems like a good fit for this requirement but due to the problems I've read about, I am hesitant in implementing it as there are also some pretty heavy reporting requirements for this system as well. I think I've come up with a way around this but would like to pose the question to the SO community.

Given that typical normalized database (OLTP) still isn't always the best option for running reports, a good practice seems to be having a "reporting" database (OLAP) where the data from the normalized database is copied to, indexed extensively, and possibly denormalized for easier querying. Could the same idea be used to work around the shortcomings of an EAV design?

The main downside I see are the increased complexity of transferring the data from the EAV database to reporting as you may end up having to alter the tables in the reporting database as new fields are defined in the EAV database. But that is hardly impossible and seems to be an acceptable tradeoff for the increased flexibility given by the EAV design. This downside also exists if I use a non-SQL data store (i.e. CouchDB or similar) for the main data storage since all the standard reporting tools are expecting a SQL backend to query against.

Do the issues with EAV systems mostly go away if you have a seperate reporting database for querying?

EDIT: Thanks for the comments so far. One of the important things about the system I'm working on it that I'm really only talking about using EAV for one of the entities, not everything in the system.

The whole gist of the system is to be able to pull data from multiple disparate sources that are not known ahead of time and crunch the data to come up with some "best known" data about a particular entity. So every "field" I'm dealing with is multi-valued and I'm also required to track history for each. The normalized design for this ends up being 1 table per field which makes querying it kind of painful anyway.

Here are the table schemas and sample data I'm looking at (obviously changed from what I'm working on but I think it illustrates the point well):

EAV Tables

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_Value
-------------------------------------------------------------------
- PersonId - Source - Field       - Value         - EffectiveDate -
-------------------------------------------------------------------
-      123 -    CIA - HomeAddress - 123 Cherry Ln -    2010-03-26 -
-      123 -    DMV - HomeAddress - 561 Stoney Rd -    2010-02-15 -
-      123 -    FBI - HomeAddress - 676 Lancas Dr -    2010-03-01 -
-------------------------------------------------------------------

Reporting Table

Person_Denormalized
----------------------------------------------------------------------------------------
-  Id - Name      - HomeAddress   - HomeAddress_Confidence - HomeAddress_EffectiveDate - 
----------------------------------------------------------------------------------------
- 123 - Joe Smith - 123 Cherry Ln -                  0.713 -                2010-03-26 -
----------------------------------------------------------------------------------------

Normalized Design

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_HomeAddress
------------------------------------------------------
- PersonId - Source - Value         - Effective Date - 
------------------------------------------------------
-      123 -    CIA - 123 Cherry Ln -     2010-03-26 -
-      123 -    DMV - 561 Stoney Rd -     2010-02-15 -
-      123 -    FBI - 676 Lancas Dr -     2010-03-01 -
------------------------------------------------------

The "Confidence" field here is generated using logic that cannot be expressed easily (if at all) using SQL so my most common operation besides inserting new values will be pulling ALL data about a person for all fields so I can generate the record for the reporting table. This is actually easier in the EAV model as I can do a single query. In the normalized design, I end up having to do 1 query per field to avoid a massive cartesian product from joining them all together.

解决方案

Short answer - yes, a reporting database is a reasonable approach to solving the problems of reporting from an EAV data model.

I spent a number of years working with an information management solution which allowed end users complete freedom to define their own data model, with both the schema and the data stored using an EAV model. Interestingly, this product provided meta-schema objects used to fulfill reporting requirements (e.g. graphs to provide object navigation, views to perform projection, etc.). This meant that the end user was free to define queries using the same terms and concepts that they'd used to build the data model in the first instance. The act of reporting was essentially to compute the data set by navigating these definitions, and hand the result over to a traditional report writing tool as if it were relational data.

One of the strengths of this approach was that the same mechanism that was already in place to transform the EAV model to something the user could work with could be reused and applied to the reporting function.

这篇关于如何克服缺点的EAV数据库报告的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆