如何在C#中处理前端的复合键 [英] how to handle composit key on front side in c#

查看:79
本文介绍了如何在C#中处理前端的复合键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在c#中处理前端的组合键?

How to handle composit key on front side in c#?

推荐答案

定义实体的表是定义客户,销售人员甚至销售交易的表.这些表的主键不是我在这里讨论的.您可以使用GUID列,标识列,较长的描述性文本列,或您认为适合用作定义实体表的主键的任何内容.我没事的,不管他们说什么使你的船浮起来.关于确定这些表的最佳主键的最佳方法以及所有各种方法的利弊,有很多讨论和想法,但是总的来说,这并不是我真正要解决的问题.

但是,与实体相关的表却是另一回事.

假设我们有一个跟踪客户的系统,并允许您将多个产品分配给多个客户以指示他们有资格订购的产品.这称为客户与产品之间的多对多或N:N关系.我们已经有一个产品表和一个客户表. Products表的主键是ProductID,Customers表的是CustomerID.在这一点上,这些"ID"列是自然的还是替代的,标识或GUID,数字或文本或代码都无关紧要.

相关和重要的事情,以及我在这里要讨论的是我们如何定义CustomerProducts表.该表将客户与产品相关联,因此该表的目的是将我们数据库中已经定义的两个实体相关联.我们还添加一个简单的"OrderLimit"列,以指示允许订购多少种该产品. (这只是一个简单的示例,任何属性都可以).我们应该如何定义该表?

出于某种原因,一个非常常见的答案是,我们仅创建一个包含4列的表:一个存储客户ID的表,一个存储与之相关的产品ID的商品,订单限制,当然主键列是身份:

创建表CustomerProducts
(
Customer_ProductID int身份主键,
CustomerID int引用的Customers(CustomerID)不为空,
ProductID int引用的Products(ProductID)不为null,
OrderLimit int不为null
)

这就是我多年来使用的大多数数据库中看到的内容.以这种方式设计表格的原因是什么?老实说,我不知道!我只能推测是因为缺乏对表的主键的真正理解,并且它可能是身份以外的东西,并且可能由多个列组成.正如我提到的,似乎许多数据库架构师根本不了解这一事实.

那么,这是什么问题呢?主要问题是数据完整性.该表允许我输入以下数据:

CustomerProductID CustomerID产品ID OrderLimit
1 1 100 25
2 1 100 30

在以上数据中,customerID#1,productID#100的订单限制是多少?是25还是30?没有办法确切地确定性地知道.数据库中没有任何内容限制此表,因此每个CustomerID/ProductID组合仅具有一行.记住,我们的主键只是一个身份,它不会约束任何东西.

像这样的大多数数据库设计只是假设(希望?)数据将始终是可以的,并且不会有重复项. UI将处理此事!但是,即使您认为一个应用程序上只有一种表单曾经更新过该表,您也必须记住,数据将始终以不同的方式进入和退出系统.如果您升级系统并必须移走数据,会发生什么情况?如果您需要从备份中还原某些事务怎么办?如果您需要进行批量导入以节省宝贵的数据输入时间怎么办?还是要转换您正在吸收或集成的新系统中的数据?

如果您曾经在系统之外编写报告或应用程序,并且只是假设数据将以某种方式受到约束,但是数据库本身并不能保证,那么您要么a)大大过度了设计简单的SQL处理不良数据可能性的陈述;或b)完全忽略不良数据的可能性,并为以后的问题做好准备.可以适当地约束数据,高效,容易做到,而且必须要做,否则您一开始就不应该真正使用数据库-您正在放弃它提供了非常重要的优势.

因此,要使用此表设计解决该问题,我们需要在我们的CustomerID/ProductID列上创建一个唯一约束:

在CustomerProducts(客户ID,产品ID)上创建唯一索引cust_products_unique

现在,我们保证,CustomerID和ProductID的每种组合仅存在一行.这样就解决了这个问题,我们的数据现在具有完整性,所以我们似乎已经准备就绪了,对吧?

好吧,让我们记住真正的主键的定义.它是表中唯一标识数据的每一行的一组列.同样,为了使表规范化,表中的所有非主键列应完全依赖于该表的主键.

请考虑以下设计:

创建表CustomerProducts
(
CustomerID int引用的Customers(CustomerID)不为空,
ProductID int引用的Products(ProductID)不为null,
OrderLimit int不为null,
主键(客户ID,产品ID)
)

请注意,这里我们已经消除了标识列,而是定义了一个复合(多列)主键作为CustomerID和ProductID列的组合.因此,我们不必创建其他唯一约束.我们也不需要真正没有用的附加标识列.我们不仅在物理上简化了数据模型,而且在逻辑上更加合理,并且此表的主键准确地解释了该表正在建模的内容– CustomerID与ProductID的关系.

回到规范化,我们也知道OrderLimit列应依赖于主键列.从逻辑上讲,我们的OrderLimit是根据CustomerID和ProductID的组合确定的,因此从物理上讲,此表设计是有意义的,并且已完全标准化.如果我们的主键只是一个无意义的自动生成的身份列,那么这不合逻辑,因为我们的OrderLimit不依赖于此.

有人认为主键中有多个列会使事情变得复杂"或使事情效率降低",而不是始终使用标识列.这根本不是那么回事.我们已经确定,必须向数据添加其他唯一约束以具有完整性,因此,不仅仅是:
单一索引的复合主键可以唯一地约束我们的数据
相反,我们需要:
附加的标识列
该标识列上的主键索引
在逻辑上定义数据的列上的附加唯一约束
因此,我们实际上是在增加设计的复杂性和开销,而不是简化!而且,我们需要更多的内存和资源来存储和处理表中的数据.

另外,请记住,数据模型可能是一件复杂的事情.我们拥有定义了主键的各种表,这些主键使我们可以识别它们正在建模的对象,并且我们具有关系和约束以及数据类型和其余的关系.理想情况下,您应该能够查看表的主键并了解它的全部含义,以及它与其他表的关系,而无需基本上忽略表的主键,而不必研究表的主键.该表真正确定正在发生的事情!它根本没有意义,并给架构增加了不必要的混乱和复杂,因此很容易避免.

有人会说,能够使用单个整数值快速标记和标识产品与客户的关系会使事情变得容易,但是我们又使事情变得过于复杂.如果仅知道我们正在用户界面中编辑CustomerProductID#452,那将告诉我们什么?没有什么!我们每次需要从CustomerProducts表中进行选择,只是为了获取我们正在处理的CustomerID和ProductID,以便显示标签或描述或从这些表中获取任何相关数据.相反,如果由于我们使用表的真实自然主键而知道我们正在编辑CustomerID#1和productID#6,则根本不需要从该表中进行选择即可获得这两个非常重要的属性.

建模有很多复杂性和方法,还有很多复杂的情况我在这里没有讨论.我真的只是在摸摸表面.但是我的总体观点是至少要知道复合主键,以及主键并不总是一个自动生成的列这一事实.从逻辑设计和物理性能的角度来看,许多不同的方法各有利弊,但是请仔细考虑使主键具有重要意义的想法,不要自动假设只是将身份列附加到您的所有对象上表将为您提供最佳的数据库设计.

而且,请记住-在定义实体时,我了解使用身份或GUID或任何您喜欢的东西代替真实世界的数据是有优势的.当我们关联实体时,我们应该考虑使用实体表中现有的主键列(无论您如何定义它们)来为我们的实体关系表构造一个智能,逻辑和准确的主键,以避免需要创建额外的主键,其他标识列和唯一约束.
Tables that define entities are tables that define customers, or sales people, or even sales transactions. The primary key of these tables is not what I am here to discuss. You can use GUID columns, identity columns, long descriptive text columns, or whatever it is you feel comfortable to use as primary keys on tables that define entities. It’s all fine by me, whatever floats your boat as they say. There are lots of discussions and ideas about the best way to determine what the best primary key of these tables should be, and pros and cons of all of the various approaches, but overall, that is not really what I am addressing.

Tables that relate entities, however, are a different story.

Suppose we have a system that tracks customers, and allows you to assign multiple products to multiple customers to indicate what they are eligible to order. This is called a many-to-many or N:N relation between Customers and Products. We already have a table of Products, and a table of Customers. The primary key of the Products table is ProductID, and the Customers table is CustomerID. Whether or not these "ID" columns are natural or surrogate, identity or GUID, numerical or text or codes, is irrelevant at this point.

What is relevant and important, and what I am here to discuss, is how we define our CustomerProducts table. This table relates customers to products, so the purpose of the table is to relate two entities that have already been defined in our database. Let’s also add a simple "OrderLimit" column which indicates how many of that product they are allowed to order. (This is just a simple example, any attribute will do). How should we define this table?

For some reason, a very common answer is that we simply create a table with 4 columns: One that stores the CustomerID, one that stores the ProductID we are relating it to, the Order Limit, and of course the primary key column which is an identity:

Create table CustomerProducts
(
Customer_ProductID int identity primary key,
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null
)

This is what I see in perhaps most of the databases that I’ve worked with over the years. The reason for designing a table in this manner? Honestly, I don’t know! I can only surmise that it is because of the lack of understanding what a primary key of a table really is, and that it can be something other than an identity and that it can be comprised of more than just a single column. As I mentioned, it seems that many database architects are simply not aware of this fact.

So then, what is the problem here? The primary issue is data integrity. This table allows me to enter the following data:

CustomerProductID CustomerID ProductID OrderLimit
1 1 100 25
2 1 100 30

In the above data, what is the order limit for customerID #1, productID #100? Is it 25 or 30? There is no way to conclusively know for sure. Nothing in the database constrains this table so that we only have exactly one row per CustomerID/ProductID combination. Remember, our primary key is just an identity, which does not constrain anything.

Most database designs like this just assume (hope?) that the data will be always be OK and there will be no duplicates. The UI will handle this, of course! But even if you think that only one single form on one single application ever updates this table, you have to remember that data will always get in and out of your system in different ways. What happens if you upgrade your system and have to move the data over? What if you need certain transactions restored from a back up? What if you ever need to do a batch import to save valuable data entry time? Or to convert data from a new system that you are absorbing or integrating?

If you ever write a report or an application off of a system and simply assume that the data will be constrained a certain way, but the database itself does not guarantee that, you are either a) greatly over-engineering what should be a simple SQL statement to deal with the possibility of bad data or b) ignoring the possibility of bad data completely and setting yourself up for issues down the road. It''s possible to constrain data properly, it''s efficient, it''s easy to do, and it simply must be done or you should not really be working with a database in the first place -- you are forgoing a very important advantage it provides.

So, to handle that issue with this table design, we need create a unique constraint on our CustomerID/ProductID columns:

create unique index cust_products_unique on CustomerProducts (CustomerID, ProductID)

Now, we are guaranteed that there will only be exactly one row per combination of CustomerID and ProductID. That handles that problem, our data now has integrity, so we seem to be all set, right?

Well, let’s remember the definition of what a primary key really is. It is the set of columns in a table that uniquely identify each row of data. Also, for a table to be normalized, all non-primary key columns in a table should be fully dependent on the primary key of that table.

Consider instead the following design:

Create table CustomerProducts
(
CustomerID int references Customers(CustomerID) not null,
ProductID int references Products(ProductID) not null,
OrderLimit int not null,
Primary key (CustomerID, ProductID)
)

Notice here that we have eliminated the identity column, and have instead defined a composite (multi-column) primary key as the combination of the CustomerID and ProductID columns. Therefore, we do not have to create an additional unique constraint. We also do not need an additional identity column that really serves no purpose. We have not only simplified our data model physically, but we’ve also made it more logically sound and the primary key of this table accurately explains what it is this table is modeling – the relationship of a CustomerID to a ProductID.

Going back to normalization, we also know that our OrderLimit column should be dependent on our primary key columns. Logically, our OrderLimit is determined based on the combination of a CustomerID and a ProductID, so physically this table design makes sense and is fully normalized. If our primary key is just a meaningless auto-generated identity column, it doesn’t make logical sense since our OrderLimit is not dependent on that.

Some people argue that having more than one column in a primary key "complicates things" or "makes things less efficient" rather than always using identity columns. This is simply not the case. We’ve already established that you must add additional unique constraints to your data to have integrity, so instead of just:
A single indexed composite primary key that uniquely constrains our data
we instead need:
An additional identity column
A primary key index on that identity column
An additional unique constraint on the columns that logically define the data
So we are actually adding complexity and overhead to our design, not simplifying! And we are requiring more memory and resources to store and manipulate data in our table.

In addition, let''s remember that a data model can be a complicated thing. We have all kinds of tables that have primary keys defined that let us identify what they are modeling, and we have relations and constraints and data types and the rest. Ideally, you should be able to look at a table''s primary key and understand what it is all about, and how it relates to other tables, and not need to basically ignore the primary key of a table and instead investigate unique constraints on that table to really determine what is going on! It simply makes no sense and adds unnecessary confusion and complication to your schema that is so easily avoided.

Some people will claim that being able to quickly label and identify the relation of a Product to a Customer with a single integer value makes things easier, but again we are over-complicating things. If we only know we are editing CustomerProductID #452 in our user interface, what does that tell us? Nothing! We need to select from the CustomerProducts table every time just to get the CustomerID and the ProductID that we are dealing with in order to display labels or descriptions or to get any related data from those tables. If, instead, we know that we are editing CustomerID #1 and productID #6 because we are using a true, natural primary key of our table, we don’t need to select from that table at all to get those two very important attributes.

There are lots of complexities and many ways to model things, and there are many complicated situations that I did not discuss here. I am really only scratching the surface. But my overall point is to at least be aware of composite primary keys, and the fact that a primary key is not always a single auto-generated column. There are pros and cons to many different approaches, from both a logical design and physical performance perspective, but please consider carefully the idea of making your primary keys count for something and don’t automatically assume that just tacking on identity columns to all of your tables will give you the best possible database design.

And, remember -- when it comes to defining your entities, I understand that using an identity or GUID or whatever you like instead of real-world data has advantages. It is when we relate entities that we should consider using those existing primary key columns from our entity tables (however you had defined them) to construct an intelligent and logical and accurate primary key for our entity relation table to avoid the need to create extra, additional identity columns and unique constraints.


这篇关于如何在C#中处理前端的复合键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆