如何在列上存储元数据 [英] How to store meta-data on columns

查看:94
本文介绍了如何在列上存储元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你正在收集关于即将到来的超级英雄电影发行的内幕信息,而您的主要电影表如下所示:



表1 男主角领导女性恶棍
---------------

  -------------------------------------------------- --------- 
绿色灯笼库布里克罗伯特·雷德福德麦莉赛勒斯休·格兰特
The Tick梅尔吉布森凯文·索尔波琳达·亨特安东尼·霍普金斯

这应该能够很好地工作,并且允许非常简单的查询以及行之间的比较。



然而你想跟踪每个数据事实的来源,以及发现事实的记者的名字。这似乎表明了这样一种 EAV 表格:



表2

 电影属性值源记者
----------------------------------------------- -----------------------------------
绿灯总监Kubrick CHUD Sarah
绿色灯笼领先的男性罗伯特·雷德福CHUD詹姆斯
绿色的灯笼领先的女性麦莉赛勒斯黑暗的地平线詹姆斯
绿色的灯笼恶棍休·格兰特CHUD莎拉
Tick总监梅尔吉布森雅虎卡梅伦
...

哪些虽然很容易地捕获我们想要的元数据,但使查询变得更加困难。只需简单地获取单个电影的所有基本数据就需要更多。更具体地说,你必须在这里处理四排,以获得绿色灯笼的四个重要信息,而在表1中,它是一个单一的,很好地封装的行。



只有这样的元数据:



表3

 电影属性来源记者
------------- -------------------------------------------------- -------------------
绿色灯笼主任CHUD Sarah
绿色灯笼领先的男性CHUD James
绿色的灯笼领先的女性黑暗的地平线James
绿色灯笼恶棍CHUD莎拉
Tick总监雅虎卡梅伦
...

但是这是非常危险的,因为如果有人将表1中的列名称改为恶棍,那么表3中的行仍然会简单地表示恶棍,因此相关数据将不可靠地解耦。如果属性列链接到另一个用作表1列列表的表,那么这可以得到帮助。当然,DBA将负责维护这个枚举表以匹配表1的实际列。实际上可能会进一步改进,而不是手动创建枚举表,使用SQL Server中包含表1中列的名称的系统视图。尽管我不确定你可以有关系系统视图。



你建议什么? EAV是唯一的方法吗?



如果它只是一个元数据列(只是Source而不是记者) - 仍然是必要的去EAV路线?你可以有专栏导演,导演来源,领先男性,领先的男性资源等,但是这很容易变得丑陋。有没有一些更好的解决方案我不在想?



如果我还没有澄清任何一点,请评论,我会添加更多的必要。哦,是的,我使用的电影数据是制造的:)



编辑:简要地说明我的主要问题,我想要简单和真实的RDBMS设计表1,它真正地描述了电影输入,同时仍然以安全和可访问的方式将元数据存储在属性上。这可能吗?还是EAV的唯一方法?



编辑2:在做了更多的网络研究之后,我还没有找到关于EAV的讨论,围绕着将元数据存储在列。实施EAV的主要原因几乎总是动态和不可预测的列,在我的例子中不是这样。在我的例子中,总是有四列:导演,领导男,领导女,小人。但是,我想为每一行存储每个列的某些事实(来源和记者)。 EAV将促进这一点,但是我想避免这样做。



更新



使用表2设计,除了将Movie列重命名为Name,并调用整个表Movie,这里是SQL Server 2008中的重点操作,以获取表1:

 选择名称,[导演],[主角男],[主角女],[恶棍] 
FROM(选择名称,属性,价值从电影)作为src
PIVOT

最大(价值)
FOR属性IN([导演],[领导男性],[主角女性],[恶棍])
)AS PivotTable


解决方案

您的可以改变你在设计中考虑的事实价值 ...看来,您的数据模型中的一个事实可以表示为以下N元组:

 电影| FactType | FactValue | FactSource |事实记者

以下表格结构应该支持您想要的数据模型,并且可以相对容易地编入索引并加入。您还可以创建一个视图,仅显示事实值和事实类型,以便您可以创建以下透视图:

  MovieID |电影名称|导演| LeadingMale |领导女性| PrimaryVillain | etc 

有趣的是,您可以认为这是将EAV模型完全应用于数据的逻辑延伸,并将个人电影(导演,导演,恶棍等的直观归因)分解成一个枢轴结构,其中属性集中在信息的来源上。



提出的数据模型的好处是:




  • 它的规范性很好(尽管你应该规范化FactType字段作为参考表的完整性)

  • 可以创建一个视图,将事实类型有效地转换成表格结构

  • 它是相对可扩展,并允许数据库强制引用完整性和(如果需要)基数约束

  • MovieFact表可以被子类化以支持不同类型的电影事实,而不仅仅是那些简单的文本字段

  • 对数据的简单查询相对有效



数据模型是:




  • 复合,条件查询更难(但不是不可能)写入(例如找到导演是A,领导男性是B等的所有电影...)

  • 这种模式比传统的方法稍微不那么明显,或者涉及EAV结构

  • 插入和更新有点棘手,因为更新多个事实需要更新多个行,而不是多个列



I将电影数据提升到一个级别以使结构正常化,您可以将电影名称推到MovieFact结构中,以保持一致性(因为对于某些电影,我可以想像,即使这样,您可能希望跟踪源信息的名称)。

  Table Movie 
================== ====
MovieID NUMBER,PrimaryKey
MovieName VARCHAR

表MovieFact
================ ======
MovieID NUMBER,PrimaryKeyCol1
FactType VARCHAR,PrimaryKeyCol2
FactValue VARCHAR
FactSource VARCHAR
事实记者VARCHAR

您的虚构电影数据将如下所示:

 电影表
==================================== $ // $ b MovieID MovieName ========================================
============================================ ====================================
1绿色灯笼
2 Tick

MovieFact表
============================== ================================================
MovieID FactType FactValue FactSource FactJournalist
================================= $ $ $ $ $ $ $ $ $ $ b 1导演库布里克CHUD莎拉
1领先的男性罗伯特·雷德福德CHUD詹姆斯
1领先的女性麦莉赛勒斯黑暗视野詹姆斯
1维兰德·格兰特CHUD莎拉
2导演梅尔吉布森雅虎卡梅伦
2领导男约翰·兰伯特雅虎埃里卡
...


Let's say you're collecting insider info on upcoming superhero movie releases and your main Movie table looks something like this:

Table 1

Title              Director   Leading Male      Leading Female    Villain
--------------------------------------------------------------------------
Green Lantern      Kubrick    Robert Redford     Miley Cyrus     Hugh Grant  
The Tick          Mel Gibson  Kevin Sorbo        Linda Hunt    Anthony Hopkins

This should work very well in general and allow very easy queries as well as comparisons between rows.

However, you'd like to track the source of each data fact, as well as the name of the journalist who discovered the fact. This seems to suggest some sort of an EAV table like this:

Table 2

Movie             Attribute            Value          Source          Journalist
----------------------------------------------------------------------------------
Green Lantern      Director           Kubrick         CHUD              Sarah
Green Lantern    Leading Male      Robert Redford     CHUD              James
Green Lantern   Leading Female      Miley Cyrus    Dark Horizons        James
Green Lantern      Villain           Hugh Grant       CHUD              Sarah
The Tick           Director          Mel Gibson       Yahoo            Cameron
...

Which, while it easily captures the meta-data that we wanted, makes queries harder. It takes a bit more to simply get all the basic data of a single movie. More specifically, you have to deal with four rows here to get the four important tidbits of information on the Green Lantern while in table 1 it is a single, nicely encapsulated row.

So my question is, in light of the complications I just described, and because I know in general EAV tables are to be avoided, is the EAV still the best solution? It does seems like it is the only reasonable way to represent this data. The only other alternative I see is to use table 1 in conjunction with another one that only houses meta data like this:

Table 3

Movie             Attribute            Source          Journalist
----------------------------------------------------------------------------------
Green Lantern      Director             CHUD              Sarah
Green Lantern    Leading Male           CHUD              James
Green Lantern   Leading Female      Dark Horizons         James
Green Lantern      Villain              CHUD              Sarah
The Tick           Director             Yahoo            Cameron
...

But this is very dangerous because if someone changes a column name in table 1, like "Villain" to "Primary Villain," the row in table 3 will still simply say "Villain" and thus the related data will be unfortunately decoupled. This could be helped if the "attribute" column was linked to another table that served as an enumeration of the columns of table 1. Of course, the DBA would be responsible for maintaining this enumeration table to match the actual columns of table 1. And it might actually be possible to improve this even further by instead of creating the enumeration table by hand, use a system view in SQL Server that houses the names of the columns in table 1. Though I'm not sure you can have relationships that involve system views.

What do you suggest? Is the EAV the only way to go?

And what if it was only one meta-data column (just "Source" without "Journalist") - is it still necessary to go the EAV route? You could have columns "Director," "Director_Source," "Leading Male," "Leading Male_Source," etc., but that gets ugly very quickly. Is there some better solution I'm not thinking of?

If I haven't clarified any point please comment and I'll add more as necessary. Oh yeah, and the movie data I used is fabricated :)

Edit: To restate my primary question concisely, I would like to have the simplicity and the true RDBMS design of table 1, which really describes a movie entry well, while still storing the meta data on the attributes in a safe and accessible manner. Is this possible? Or is EAV the only way?

Edit 2: After doing some more web research, I have yet to find a discussion on EAV's that centered around the desire to store metadata on the columns. The primary reason given to implement an EAV is almost always dynamic and unpredictable columns, which is not the case in my example. In my example, There are always the same four columns: director, leading male, leading female, villain. However, I want to store certain facts (source and journalist) about each column for each row. An EAV would facilitate this, but I would like to avoid resorting to that.

Update

Using the Table 2 design except for renaming the column "Movie" to "Name" and calling the whole table "Movie," here is the pivot operation in SQL Server 2008 to get back Table 1:

SELECT Name, [Director], [Leading Male], [Leading Female], [Villain]
FROM (Select Name, Attribute, Value FROM Movie) as src
PIVOT
(
Max(Value)
FOR Attribute IN ([Director], [Leading Male], [Leading Female], [Villain])
)  AS PivotTable

解决方案

Your can change what you consider a fact value in your design ... it seems that a fact in your data model could be expressed as the following N-tuple:

Movie | FactType | FactValue | FactSource | FactJournalist

The following table structures should support the data model you want, and can relatively easily be indexed and joined. You can also create a view that pivots out just the fact value and fact type so that you can create the following perspective:

MovieID | Movie Name | Director | LeadingMale | LeadingFemale | PrimaryVillain | etc

Interestingly, you could consider this to be the logical extension of fully applying an EAV model to the data, and decomposing an individual movie (with it's intuitive attribution of director, lead, villain, etc) into a pivoted structure where attributes focus on the source of the information instead.

The benefits of the proposed data model are:

  • it is well-normalized (though you should probably normalize the FactType field into a reference table for completeness)
  • it is possible to create a view that pivots fact types efficiently out into a tabular structure
  • it is relatively extensible and allows the database to enforce referential integrity and (if desired) cardinality constraints
  • the MovieFact table can be subclassed to support different kinds of movie facts, not just those that are simple text field
  • simple queries against the data are relatively efficient

Some of the disadvantages of the data model are:

  • Composite, conditional queries are harder (but not impossible) to write (e.g. find all movies where Director is A and Leading Male is B, etc...)
  • The model is somewhat less obvious than the more traditional approach, or one involving EAV structures
  • inserts and updates are a little trickier because updating multiple facts requires updating multiple rows, not multiple columns

I've the Movie data up a level to normalize the structure, and you could pushed the movie name down into the MovieFact structure for consistency (since for some movies I can imagine even then name is something you may want to track source information for).

Table Movie
========================
MovieID   NUMBER, PrimaryKey
MovieName VARCHAR

Table MovieFact
========================
MovieID          NUMBER,  PrimaryKeyCol1
FactType         VARCHAR, PrimaryKeyCol2
FactValue        VARCHAR
FactSource       VARCHAR
FactJournalist   VARCHAR

Your fictional movie data would then look like the following:

Movie Table
====================================================================================
MovieID  MovieName
====================================================================================
1        Green Lantern
2        The Tick

MovieFact Table
====================================================================================
MovieID  FactType       FactValue         FactSource       FactJournalist
====================================================================================
1        Director       Kubrick           CHUD             Sarah
1        Leading Male   Robert Redford    CHUD             James
1        Leading Female Miley Cyrus       Dark Horizons    James
1        Villain        Hugh Grant        CHUD             Sarah
2        Director       Mel Gibson        Yahoo            Cameron
2        Leading Male   John Lambert      Yahoo            Erica
...

这篇关于如何在列上存储元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆