如何设计数据库模式以支持使用类别进行标记? [英] How to design a database schema to support tagging with categories?

查看:187
本文介绍了如何设计数据库模式以支持使用类别进行标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试如标记数据库设计,除了我的每个标记分为几类。



例如,我有一个关于车辆的数据库。假设我们实际上对车辆的了解不多,所以我们不能指定所有车辆都有的列。因此,我们将标记车辆的信息。

  1。制造:奔驰
型号:SLK32 AMG
可转换:硬顶

2.制造:福特
型号:GT90
生产阶段:原型

3.制造商:马自达
型号:MX-5
convertible:softtop



现在,你可以看到所有的汽车都标有他们的制造和型号,但其他类别并不匹配。请注意,汽车只能有一个类别。 IE浏览器。一辆车只能有一个制造商。

我想设计一个数据库来支持搜索所有梅赛德斯,或者能够列出所有制造商。



我目前的设计是这样的:

 车辆
int vid
String vin

vehicleTags
int vid
int tid

标签
int tid
字符串标签
int cid

类别
int cid
字符串类别



我有所有正确的主钥匙和外钥匙,除非我不能处理每辆车只能有一个制造商的情况。或者我可以吗?



我可以添加外键约束到vehicleTags的复合主键吗? IE浏览器。我是否可以添加一个约束条件,使得只有在车辆标签中没有行的情况下,才能将复合主键(vid,tid)添加到车辆标签中,以便对于同一个vid,还没有tid相同的cid?



我的猜测是否定的。我认为这个问题的解决方案是添加一个cid列vehicleTags,并使新的组合主键(VID,CID)。它看起来像:

  vehicleTags 
int vid
int cid
int tid

这样可以防止一辆车有两个制造商,但现在我已经复制了tid在cid中的信息。

我的模式应该是什么?



Tom在我之前的问题中注意到了我的数据库模式中的这个问题,你如何做很多表外连接?

编辑

我知道在这个例子中,制造商实际上应该是车辆表中的一列,但是让我们说你不能那样做。这个例子只是一个例子。

设计了一个更具辨识性的EAV表格,其中包含了一个更易识别的EAV表格。如下所示:
$ b $ pre $ code $ CREATE TABLE vehicleEAV(
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY(vid,attr_name),
FOREIGN KEY(vid)REFERENCES vehicles(vid)
);

有些人强制 attr_name 引用查找

您所做的只是简单地将一张EAV表分布在三张表上,但不会改善元数据的顺序:

  CREATE TABLE vehicleTag(
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY(vid,cid),
FOREIGN KEY(vid)REFERENCES vehicles(vid),
FOREIGN KEY(cid)REFERENCES categories(cid),
FOREIGN KEY(tid)参考标签(tid)
);

CREATE TABLE类别(
cid INTEGER PRIMARY KEY,
category VARCHAR(20) - attr_name
);

CREATE TABLE标记(
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) - attr_value
);

如果您要使用EAV设计,您只需要 vehicleTags categories 表。

  CREATE TABLE vehicleTag(
vid INTEGER,
cid INTEGER, - 引用attr_namelookup table
tag VARCHAR(100, - attr_value
PRIMARY KEY(vid,cid),
FOREIGN KEY(vid)REFERENCES vehicles(vid),
FOREIGN KEY(cid )参考类别(cid)
);

但请记住, em>将数据与元数据混合您无法将特定的约束应用于您的数据模型。 其中一个类是强制性的(传统的列使用 NOT NULL 约束)?

  • 如何使用SQL数据类型来验证你的标签值是不可以的,因为每个标签值都使用了一个长字符串,这个字符串是否足够用于将来需要的每一个标签?你不能说。 b $ b
  • 你如何限制你的一些标签到一组允许的值(一个方便国家表使用外键查找表)?这是你的softtop与soft top的例子。但是您不能对标签列进行约束,因为该约束将应用于其他类别的所有其他标签值。您可以有效地限制引擎大小并将颜色设置为soft top。



  • SQL数据库无法正常工作模型。要正确的查询是非常困难的,而查询它变得非常复杂。如果您继续使用SQL,那么您最好对传统的表进行建模,每个属性一列。如果您需要子类型,那么为每个子类型定义一个子表(类继承

    a>),否则使用单表继承。如果每个实体的属性具有无限变化,请使用序列化LOB 。另外一种为这种流动的非关系型数据模型设计的技术是一个语义数据库,它将数据存储在 RDF ,并使用 SPARQL <一>。一个免费的解决方案是芝麻


    I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.

    For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.

    1. manufacture: Mercedes
       model: SLK32 AMG
       convertible: hardtop
    
    2. manufacture: Ford
       model: GT90
       production phase: prototype
    
    3. manufacture: Mazda
       model: MX-5
       convertible: softtop
    

    Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.

    I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.

    My current design is something like this:

    vehicles
      int vid
      String vin
    
    vehicleTags
      int vid
      int tid
    
    tags
      int tid
      String tag
      int cid
    
    categories
      int cid
      String category
    

    I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?

    Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?

    My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:

    vehicleTags
      int vid
      int cid
      int tid
    

    This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.

    What should my schema be?

    Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?

    EDIT
    I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.

    解决方案

    This is yet another variation on the Entity-Attribute-Value design.

    A more recognizable EAV table looks like the following:

    CREATE TABLE vehicleEAV (
      vid        INTEGER,
      attr_name  VARCHAR(20),
      attr_value VARCHAR(100),
      PRIMARY KEY (vid, attr_name),
      FOREIGN KEY (vid) REFERENCES vehicles (vid)
    );
    

    Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.

    What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:

    CREATE TABLE vehicleTag (
      vid         INTEGER,
      cid         INTEGER,
      tid         INTEGER,
      PRIMARY KEY (vid, cid),
      FOREIGN KEY (vid) REFERENCES vehicles(vid),
      FOREIGN KEY (cid) REFERENCES categories(cid),
      FOREIGN KEY (tid) REFERENCES tags(tid)
    );
    
    CREATE TABLE categories (
      cid        INTEGER PRIMARY KEY,
      category   VARCHAR(20) -- "attr_name"
    );
    
    CREATE TABLE tags (
      tid        INTEGER PRIMARY KEY,
      tag        VARCHAR(100) -- "attr_value"
    );
    

    If you're going to use the EAV design, you only need the vehicleTags and categories tables.

    CREATE TABLE vehicleTag (
      vid         INTEGER,
      cid         INTEGER,     -- reference to "attr_name" lookup table
      tag         VARCHAR(100, -- "attr_value"
      PRIMARY KEY (vid, cid),
      FOREIGN KEY (vid) REFERENCES vehicles(vid),
      FOREIGN KEY (cid) REFERENCES categories(cid)
    );
    

    But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.

    • How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
    • How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
    • How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.

    SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.

    Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is Sesame.

    这篇关于如何设计数据库模式以支持使用类别进行标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆