数据库模式混淆(索引和约束) [英] Database schema confusing (Index and Constraints)

查看:97
本文介绍了数据库模式混淆(索引和约束)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对模式的设计有一些混淆,但在开始之前,让我先向您展示模式,

  CREATE TABLE Person 

PersonID INT NOT NULL PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
- 一些列在这里..
CONSTRAINT tb_idF INDEX(FirstName),
CONSTRAINT tb_idL INDEX(LastName)
- 或
- CONSTRAINT tb_idL INDEX(FirstName,LastName )
- 其他约束...
);

CREATE TABLE JobDescription

JobDescriptionID INT NOT NULL PRIMARY KEY,
JobDescriptionName VARCHAR(50)UNIQUE
- 这里有一些列。
- constraints ...
);

这里的混淆是表格的映射表 Person JobDescription 。目前,我有这种设计,

  CREATE TABLE Person_JobDescription 

RECID INT AUTO_INCREMENT PRIMARY KEY, - 由于某些特殊原因
- 我需要映射到其他表
PersonID INT,
JobDescriptionID INT,
StartYear INT, - 年JobDescription被指定
EndYear INT,
CONSTRAINT tb_fk1 FOREIGN KEY(personID)
参考人(PersonID),
CONSTRAINT tb_fk2 FOREIGN KEY(JobDescriptionID)
参考JobDescription(JobDescriptionID),
约束tb_uq UNIQUE(PersonID,JobDescriptionID)
);

但我有另一个想法,映射表的结构将是这样

  CREATE TABLE Person_JobDescription 

PersonID INT, - 映射另一个表上的这两列
JobDescriptionID INT , - ^^
StartYear INT, - 年JobDescription被指定
EndYear INT,
CONSTRAINT tb_fk1 FOREIGN KEY(personID)
REFERENCES Person(PersonID),
CONSTRAINT tb_fk2 FOREIGN KEY(JobDescriptionID)
REFERENCES JobDescription(JobDescriptionID),
CONSTRAINT tb_pk PRIMARY KEY(PersonID,JobDescriptionID)
);

当我根据上面的表创建和测试查询时,两者都返回相同的结果,性能也与我在小型数据库上测试一样(有50k条记录)。


$ b

问题


  • 您希望在大型数据库上映射表的两个模式中的哪一个( Person_JobDescription )?



  • 根据指示,我不允许在 FirstName UNIQUE c>和 LastName




    • 我将在表上使用什么类型的索引 Person ?每个列的索引或 FirstName LastName ?的复合索引

    • 什么时候使用单索引 INDEX(Col1) INDEX(Col2) over (Col1,Col2)



    感谢您花时间阅读这个问题。 >

    最好的问候,



    Derek Floss

    解决方案

    我更喜欢第二种方法。通过使用代理ID号,当它们不是逻辑上需要识别时,您引入更多的强制性联接。这需要您遍历数据库追踪ID号,这是相当于追踪数据库上的指针的SQL。跟踪指针是IMS的特征,IMS是要替换的关系模型的数据库体系结构之一。 (IMS使用分层体系结构。)今天没有必要重新发明它。



    如果您拥有例如五个级别的代理ID号码,并且您希望一个人的名字,你必须做四个联接才能得到它。使用第二种方法,你只需要一个连接。如果您不想编写多列联接,请使用CREATE VIEW,然后只执行一次。



    性能测试非常简单。只需使用您喜欢的脚本语言生成几百万个随机行,并将它们加载到测试服务器。你不仅会发现你的性能问题隐藏在哪里,你会发现你的CREATE TABLE代码中的所有错误。 (您的代码将无法按原样工作。)了解 EXPLAIN

    对于索引,您可以在生成的random-ish行上测试和负载。如果用户始终提供名字,则(first_name,last_name)上的多列索引将最有效。但是很多用户不会这样做,宁愿用姓氏搜索。 (first_name,last_name)上的多列索引对于喜欢按姓氏搜索的用户无效。



    由于这个原因,如果有两个独立的索引,对名字和姓氏的索引通常更有效,

    是什么意思?



    这个问题所隐含的设计模式是每一行都必须有一个id号,所有的外键都必须引用id号。在SQL数据库中,它实际上是一个反模式。作为一个经验法则,任何允许你设计表格而不考虑钥匙的模式应该被推定为有罪直到被证明是无害的 - 它应该被认为是一种反模式,直到被证明不是。

      create table A(
    a_id integer primary key,
    a_1 varchar(15)not null unique,
    a_2 varchar )not null
    );

    create table B(
    b_id integer primary key
    a_id integer not null references A(a_id),
    b_1 varchar(10)not null,
    unique(a_id,b_1),
    );

    创建表C(
    c_id整数主键,
    b_id整数不为空引用B(b_id),
    c_1 char(3)不为null,
    c_2 varchar(20)not null,
    unique(b_id,c_1)
    );

    创建表D(
    d_id整数主键,
    c_id整数不为空引用C(c_id),
    d_1 integer not null,
    d_2 varchar (15),
    unique(c_id,d_1)
    );

    如果您需要关于表D的报告,并且报告需要




    • 列D.d_1和D.d_2以及

    • 列A.a_1和A.a_2,



    您需要3个连接才能访问它。 (尝试一下。)你在追踪身份证号码。下面的结构是不同的。

      create table A(
    a_1 varchar )primary key,
    a_2 varchar(15)not null
    );

    创建表B(
    a_1 varchar(15)not null references A(a_1),
    b_1 varchar(10)not null,
    主键(a_1, b_1),
    );

    创建表C(
    a_1 varchar(15)not null,
    b_1 varchar(10)not null,
    c_1 char(3)not null,
    c_2 varchar(20)not null,
    主键(a_1,b_1,c_1),
    外键(a_1,b_1)引用B(a_1,b_1)
    );

    创建表D(
    a_1 varchar(15)not null,
    b_1 varchar(10)not null,
    c_1 char(3)not null,
    d_1 integer not null,
    d_2 varchar(15),
    主键(a_1,b_1,c_1,d_1),
    外键(a_1,b_1,c_1) ,b_1,c_1)
    );

    有了这个结构,同一个报表需要一个连接。

     选择D.d_1,D.d_2,A.a_1,A.a_2 
    从D
    内部连接A on D.a_1 = A .a_1;


    I have a little confusion about the designing of the schema, but before i start, let me show you the schema first,

    CREATE TABLE Person
    (
        PersonID INT NOT NULL PRIMARY KEY,
        FirstName VARCHAR(50),
        LastName VARCHAR(50),
        -- some columns here..
        CONSTRAINT tb_idF INDEX (FirstName),
        CONSTRAINT tb_idL INDEX (LastName)
        -- or 
        -- CONSTRAINT tb_idL INDEX (FirstName, LastName)
        -- other constraints ...
    );
    
    CREATE TABLE JobDescription
    (
        JobDescriptionID INT NOT NULL PRIMARY KEY,
        JobDescriptionName VARCHAR(50) UNIQUE
        -- some columns here..
        -- constraints ...
    );
    

    and the confusion is here, the mapping table for tables: Person and JobDescription. Currently, I have this design,

    CREATE TABLE Person_JobDescription
    (
        RECID INT AUTO_INCREMENT PRIMARY KEY,   -- for some special reasons
                                                -- I need to map to other table
        PersonID INT,
        JobDescriptionID INT,
        StartYear INT,                          -- year JobDescription was Appointed
        EndYear INT,
        CONSTRAINT tb_fk1 FOREIGN KEY (PersonID) 
            REFERENCES Person(PersonID),
        CONSTRAINT tb_fk2 FOREIGN KEY (JobDescriptionID) 
            REFERENCES JobDescription(JobDescriptionID),
        CONSTRAINT tb_uq UNIQUE (PersonID, JobDescriptionID)
    );
    

    but i have this other idea which structure of the mapping table will be like this

    CREATE TABLE Person_JobDescription
    (
        PersonID INT,           -- map these two columns on the other table
        JobDescriptionID INT,   -- ^^
        StartYear INT,          -- year JobDescription was Appointed
        EndYear INT,
        CONSTRAINT tb_fk1 FOREIGN KEY (PersonID) 
            REFERENCES Person(PersonID),
        CONSTRAINT tb_fk2 FOREIGN KEY (JobDescriptionID) 
            REFERENCES JobDescription(JobDescriptionID),
        CONSTRAINT tb_pk PRIMARY KEY (PersonID, JobDescriptionID)
    );
    

    when I created and tested the query against the tables above, both of them returns the same results and the performance is also the same as I am testing on a small database (having 50k records). I wonder if how the two queries behave on large database.

    QUESTIONS

    • Which of the two schemas of the mapping table (Person_JobDescription) will you prefer on large database?

    As instructed, I am not allowed to create a UNIQUE constraint on FirstName and LastName. But i have supplied an index on the two columns.

    • What type of index will I use on table Person? An index for each column or a compound index for FirstName and LastName?
    • When will I use single index INDEX (Col1) and INDEX (Col2) over INDEX (Col1, Col2)?

    Thank you for taking time to read on this question.

    Best Regards,

    Derek Floss

    解决方案

    I'd prefer the second approach. By using surrogate ID numbers when they're not logically necessary for identification, you introduce more mandatory joins. This requires you to "chase ID numbers all over the database", which is the SQL equivalent to "chasing pointers all over the database". Chasing pointers was characteristic of IMS, one of the database architectures the relational model intended to replace. (IMS uses a hierarchical architecture.) There's no point reinventing it today. (Although a lot of people do just that.)

    If you have, for example, five levels of surrogate ID numbers, and you want a person's name, you have to do four joins to get it. Using the second approach, you just need one join. If you don't want to write multi-column joins, use CREATE VIEW and do it just once.

    Performance is simple to test. Just generate a few million random-ish rows using your favorite scripting language, and load them into a test server. You'll not only find where your performance problems are hiding, you'll find all the errors in your CREATE TABLE code. (Your code won't work as-is.) Learn about EXPLAIN if you don't already know about it.

    As for indexing, you can test that on the random-ish rows you generate and load. A multi-column index on (first_name, last_name) will work best if users always supply a first name. But a lot of users won't do that, preferring to search by last name instead. A multi-column index on (first_name, last_name) isn't effective for users who prefer to search by last name. You can test that.

    For that reason alone, indexing of first names and last names is usually more effective if there are two separate indexes, one for the first name, and one for the last name.


    What does chasing id numbers mean?

    The unspoken design pattern underlying this question is "Every row must have an id number, and all foreign keys must reference the id number." In a SQL database, it's actually an anti-pattern. As a rule of thumb, any pattern that lets you design tables without thinking about keys should be presumed guilty until proven innocent--it should be presumed to be an anti-pattern until proven not to be.

    create table A (
     a_id integer primary key,
     a_1 varchar(15) not null unique,
     a_2 varchar(15) not null
    );
    
    create table B (
      b_id integer primary key
      a_id integer not null references A (a_id),
      b_1  varchar(10) not null,
      unique (a_id, b_1),
    );
    
    create table C (
      c_id integer primary key,
      b_id integer not null references B (b_id),
      c_1 char(3) not null,
      c_2 varchar(20) not null,
      unique (b_id, c_1)
    );
    
    create table D (
      d_id integer primary key,
      c_id integer not null references C (c_id),
      d_1 integer not null,
      d_2 varchar(15),
      unique (c_id, d_1)
    );
    

    If you need a report on table "D", and the report needs

    • columns D.d_1 and D.d_2, and
    • columns A.a_1 and A.a_2,

    you need 3 joins to get to it. (Try it.) You're chasing ID numbers. (Like chasing pointers in IMS.) The following structure is different.

    create table A (
     a_1 varchar(15) primary key,
     a_2 varchar(15) not null
    );
    
    create table B (
      a_1 varchar(15) not null references A (a_1),
      b_1  varchar(10) not null,
      primary key (a_1, b_1),
    );
    
    create table C (
      a_1 varchar(15) not null,
      b_1 varchar(10) not null,
      c_1 char(3) not null,
      c_2 varchar(20) not null,
      primary key (a_1, b_1, c_1),
      foreign key (a_1, b_1) references B (a_1, b_1)
    );
    
    create table D (
      a_1 varchar(15) not null,
      b_1 varchar(10) not null,
      c_1 char(3) not null,
      d_1 integer not null,
      d_2 varchar(15),
      primary key (a_1, b_1, c_1, d_1),
      foreign key (a_1, b_1, c_1) references C (a_1, b_1, c_1)
    );
    

    With this structure, the same report needs a single join.

    select D.d_1, D.d_2, A.a_1, A.a_2
    from D
    inner join A on D.a_1 = A.a_1;
    

    这篇关于数据库模式混淆(索引和约束)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆