如何克服EAV数据库报告的缺点? [英] How to overcome shortcomings in reporting from EAV database?

查看:27
本文介绍了如何克服EAV数据库报告的缺点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SQL 中实体-属性-值数据库设计的主要缺点似乎都与能够高效快速地查询和报告数据有关.由于这些问题以及几乎所有应用程序的查询/报告的通用性,我阅读的有关该主题的大部分信息都警告不要实施 EAV.

The major shortcomings with Entity-Attribute-Value database designs in SQL all seem to be related to being able to query and report on the data efficiently and quickly. Most of the information I read on the subject warn against implementing EAV due to these problems and the commonality of querying/reporting for almost all applications.

我目前正在设计一个系统,其中一个实体的字段在设计/编译时未知,由系统的最终用户定义.EAV 似乎很适合这个要求,但由于我读过的问题,我在实施它时犹豫不决,因为这个系统也有一些非常繁重的报告要求.我认为我已经想出了解决这个问题的方法,但想向 SO 社区提出这个问题.

I am currently designing a system where the fields for one of the entities are not known at design/compile time and are defined by the end-user of the system. EAV seems like a good fit for this requirement but due to the problems I've read about, I am hesitant in implementing it as there are also some pretty heavy reporting requirements for this system as well. I think I've come up with a way around this but would like to pose the question to the SO community.

鉴于典型的规范化数据库 (OLTP) 仍然并不总是运行报告的最佳选择,一个好的做法似乎是拥有一个报告"数据库 (OLAP),其中将规范化数据库中的数据复制到索引广泛地,并且可能进行非规范化以便于查询.是否可以使用相同的想法来解决 EAV 设计的缺点?

Given that typical normalized database (OLTP) still isn't always the best option for running reports, a good practice seems to be having a "reporting" database (OLAP) where the data from the normalized database is copied to, indexed extensively, and possibly denormalized for easier querying. Could the same idea be used to work around the shortcomings of an EAV design?

我看到的主要缺点是将数据从 EAV 数据库传输到报告的复杂性增加,因为在 EAV 数据库中定义了新字段时,您可能最终不得不更改报告数据库中的表.但这几乎是不可能的,而且对于 EAV 设计提供的增加的灵活性来说,这似乎是一个可以接受的折衷方案.如果我使用非 SQL 数据存储(即 CouchDB 或类似的)作为主数据存储,这个缺点也存在,因为所有标准报告工具都期望 SQL 后端进行查询.

The main downside I see are the increased complexity of transferring the data from the EAV database to reporting as you may end up having to alter the tables in the reporting database as new fields are defined in the EAV database. But that is hardly impossible and seems to be an acceptable tradeoff for the increased flexibility given by the EAV design. This downside also exists if I use a non-SQL data store (i.e. CouchDB or similar) for the main data storage since all the standard reporting tools are expecting a SQL backend to query against.

如果您有单独的报告数据库用于查询,EAV 系统的问题是否大部分都会消失?

Do the issues with EAV systems mostly go away if you have a seperate reporting database for querying?

感谢您到目前为止的评论.关于我正在研究的系统的重要事情之一,我实际上只是在谈论将 EAV 用于实体之一,而不是系统中的所有内容.

Thanks for the comments so far. One of the important things about the system I'm working on it that I'm really only talking about using EAV for one of the entities, not everything in the system.

该系统的整个要点是能够从事先未知的多个不同来源中提取数据,并对数据进行处理以得出有关特定实体的一些最知名"的数据.所以我处理的每个领域"都是多值的,我还需要跟踪每个领域的历史.对此的规范化设计最终是每个字段 1 个表,这使得查询无论如何都有些痛苦.

The whole gist of the system is to be able to pull data from multiple disparate sources that are not known ahead of time and crunch the data to come up with some "best known" data about a particular entity. So every "field" I'm dealing with is multi-valued and I'm also required to track history for each. The normalized design for this ends up being 1 table per field which makes querying it kind of painful anyway.

以下是我正在查看的表模式和示例数据(显然与我正在研究的内容有所不同,但我认为它很好地说明了这一点):

Here are the table schemas and sample data I'm looking at (obviously changed from what I'm working on but I think it illustrates the point well):

EAV 表

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_Value
-------------------------------------------------------------------
- PersonId - Source - Field       - Value         - EffectiveDate -
-------------------------------------------------------------------
-      123 -    CIA - HomeAddress - 123 Cherry Ln -    2010-03-26 -
-      123 -    DMV - HomeAddress - 561 Stoney Rd -    2010-02-15 -
-      123 -    FBI - HomeAddress - 676 Lancas Dr -    2010-03-01 -
-------------------------------------------------------------------

报表

Person_Denormalized
----------------------------------------------------------------------------------------
-  Id - Name      - HomeAddress   - HomeAddress_Confidence - HomeAddress_EffectiveDate - 
----------------------------------------------------------------------------------------
- 123 - Joe Smith - 123 Cherry Ln -                  0.713 -                2010-03-26 -
----------------------------------------------------------------------------------------

规范化设计

Person
-------------------
-  Id - Name      -
-------------------
- 123 - Joe Smith -
-------------------

Person_HomeAddress
------------------------------------------------------
- PersonId - Source - Value         - Effective Date - 
------------------------------------------------------
-      123 -    CIA - 123 Cherry Ln -     2010-03-26 -
-      123 -    DMV - 561 Stoney Rd -     2010-02-15 -
-      123 -    FBI - 676 Lancas Dr -     2010-03-01 -
------------------------------------------------------

此处的置信度"字段是使用无法通过 SQL 轻松表达(如果有的话)的逻辑生成的,因此除了插入新值之外,我最常见的操作是为所有字段提取有关某个人的所有数据,以便我可以生成报告表的记录.这实际上在 EAV 模型中更容易,因为我可以执行单个查询.在规范化设计中,我最终不得不对每个字段执行 1 次查询,以避免大量笛卡尔积将它们连接在一起.

The "Confidence" field here is generated using logic that cannot be expressed easily (if at all) using SQL so my most common operation besides inserting new values will be pulling ALL data about a person for all fields so I can generate the record for the reporting table. This is actually easier in the EAV model as I can do a single query. In the normalized design, I end up having to do 1 query per field to avoid a massive cartesian product from joining them all together.

推荐答案

简短回答 - 是的,报告数据库是解决来自 EAV 数据模型的报告问题的合理方法.

Short answer - yes, a reporting database is a reasonable approach to solving the problems of reporting from an EAV data model.

我花了很多年时间研究信息管理解决方案,该解决方案允许最终用户完全自由地定义他们自己的数据模型,架构和数据都使用 EAV 模型存储.有趣的是,该产品提供了用于满足报告要求的元模式对象(例如,提供对象导航的图形、执行投影的视图等).这意味着最终用户可以自由地使用他们在第一个实例中用于构建数据模型的相同术语和概念来定义查询.报告的行为本质上是通过导航这些定义来计算数据集,并将结果交给传统的报告编写工具,就好像它是关系数据一样.

I spent a number of years working with an information management solution which allowed end users complete freedom to define their own data model, with both the schema and the data stored using an EAV model. Interestingly, this product provided meta-schema objects used to fulfill reporting requirements (e.g. graphs to provide object navigation, views to perform projection, etc.). This meant that the end user was free to define queries using the same terms and concepts that they'd used to build the data model in the first instance. The act of reporting was essentially to compute the data set by navigating these definitions, and hand the result over to a traditional report writing tool as if it were relational data.

这种方法的优势之一是,可以重复使用已经存在的将 EAV 模型转换为用户可以使用的模型并应用于报告功能的机制.

One of the strengths of this approach was that the same mechanism that was already in place to transform the EAV model to something the user could work with could be reused and applied to the reporting function.

这篇关于如何克服EAV数据库报告的缺点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆