需要数据库结构建议 [英] Database Structure Advice Needed

查看:189
本文介绍了需要数据库结构建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前正在一个包含产品目录的网站上工作。我对数据库设计有点新,所以我正在寻找如何最好地做的建议。我熟悉关系数据库设计,所以我理解多对多或一对多等(在大学采取了一个良好的数据库类)。下面是一个项目可以分类为:

的示例。

 飞机 - 木 - >品牌 - >产品。 

而不是试图写我到目前为止,只是快速看看我创建的这个图像来自phpmyadmin设计师功能。



alt text http:// www .usfultimate.com / temp / db_design.jpg



现在,这一切似乎都很好,dandy,直到我意识到类别木材用于螺旋桨 - > airboat - >(木材)。这意味着,木头将不得不重新创建每次我想在不同的父母使用它。这不是世界的尽头,但我想知道是否有一个更好的方法去做这个。



此外,我想尽可能保持这个动态,以便客户可以根据需求变化组织他的目录。



*编辑。正在考虑只是创建一个标签表。所以我可以分配标签木材或金属或50英寸1到许多项目。我仍然会为主类别保留一个父母类型的东西,但是这样的类别不会那么深,不会有重复。

解决方案

首先,用户界面作为用户I 仇恨在以严格分级的方式组织的目录中搜索产品。我从来不记得什么是次级次级子类别 - 一个异国情调的产品,这迫使我浪费时间探索有希望的类别,只是为了发现它被分类在一个(对我来说,至少)奇怪的方式。



什么 Kevin Peno 建议是一个很好的建议,被称为 分面浏览 。由于 Marcia Bates 在Dot-Bomb之后:此时获取Web信息检索



本质上,分面搜索允许用户搜索您的目录从他们喜欢的任何方面开始,并让他们过滤搜索选择其他方面的信息。请注意,与通常构思标签系统的方式相反,没有什么能阻止您按层次组织其中的一些方面。



为了快速了解多面体搜索, 一些演示 ,可以在 Flamenco搜索界面项目 - 搜索流动的界面



其次,应用程序逻辑:什么 Manitra 建议也是一个好建议(如我理解它),即分离不同关系中的树/图的节点链接。他称为祖先表(这是一个更好的直观名称,但是)被称为传递闭包的有向无环图(DAG)(可达性关系)。除了性能,它大大简化了查询,因为Manitra说。



我建议一个视图,以便更新是实时和增量,不是由批处理作业定期。有一些SQL代码(但我认为它需要适应一些特定的DBMS)在我在回答图形集的查询语言:数据建模问题。具体来说,请参阅维护SQL中图形的传递闭包(.ps - 后记)。



产品分类关系



Manitra的第一点也值得强调。



他所说的是,在产品和类别之间有一个多对多的关系。



给定关系变量(relvars)产品和类别之间的关系可以是一个或多个类别,每个类别可以有零个或多个产品。例如被表示为至少具有属性P#和C#的相对PC,即,与相应的产品和类别编号的外键关系中的产品和类别编号(标识符)。



这是对类别层次结构管理的补充。



p>实施分面浏览的一个有用概念是关系分区,或者甚至, 关系比较(参见链接页面底部)。也就是说通过从用户(面导航)中选择的类别的(增长的)列表来划分PC(产品类别),仅获得这些类别中的产品(当然,类别被假定为



基于SQL的DBMS通常缺少这个运算符(除法和比较),所以我在下面给出一些有趣的文章,实现/讨论他们:





等等...





关于平面度的深度



我简要地查看了 Pras 链接的文章, 在MySQL中管理分层数据,但我在这几行后停止阅读在简介中:




简介用户在一个时间或另一个有
处理在SQL
数据库中的分层数据,毫无疑问获知,分层数据的
管理不是
什么是关系数据库的目的
。关系
数据库的表不是层次结构(如
XML),而只是一个平面列表
分层数据具有父 - 子
关系,它在关系数据库
表中不自然地表示
。 ...


要理解为什么这种坚持平面性的关系是只是废话,想象一个立方体a 三维笛卡尔坐标系:它将由8个坐标(三元组)标识,例如P1(x1,y1,z1),P2(x2,y2,z2),...,P8(x8,y8,z8)[这里我们不关心这些坐标的约束,



现在,我们将这些坐标(点)放入关系变量中,我们将这个变量命名为 Points 。我们将表示 Points 的关系值,如下表所示:

 
Points | x | y | z |
======= + ==== + ==== + ==== +
| x1 | y1 | z1 |
+ ---- + ---- + ---- +
| x2 | y2 | z2 |
+ ---- + ---- + ---- +
| .. | .. | .. |
| .. | .. | .. |
+ ---- + ---- + ---- +
| x8 | y8 | z8 |
+ ---- + ---- + ---- +

这个多维数据集是否只是以表格的方式表示它的行为?是一个关系(值)与其表格表示相同的东西?



关系变量假设为n维离散空间中的点集合,其中n是关系属性的数量(列)。对于一个n维的离散空间,它是什么意思,是平的?只是废话,正如我上面写的。



不要误会,SQL是一个设计错误的语言,并且基于SQL的DBMS是充满特殊和缺点(NULL,冗余,...),特别是坏的,DBMS-as-dumb-store类型(没有引用约束,没有完整性约束,...)。但是,这与关系数据模型幻想的限制毫无关系,相反:更多的他们转向它,更糟糕的是结果。

特别是,关系数据模型,一旦你理解它,在表示任何结构,甚至层次和图形,没有问题,正如我详细参考上面提到的发表论文。



在嵌套集模型



我忽略了该文章我对这样的逻辑设计没有特别的印象:它建议把两个不同的实体节点链接混为一个关系,这可能会导致尴尬。但我不想更彻底地分析这个设计。对不起。






Stephan Eggermont在下面的评论中反对:平面列表模型是一个问题,它是一个实现的抽象,使得性能难以实现... 。



现在,我的观点是:


  1. 这个平面列表模型 strong> fantasy :只是因为一个表格(平面列表)将关系布局(表示)并不意味着关系是平面列表(一个对象,它的表示不是同一个东西)。

  2. 逻辑表示(关系)和物理存储详细信息(水平或垂直分解,压缩,索引(散列,b +树,r-树,...)等等)是不同的;关系数据模型( RDM )的一个要点是将逻辑与物理模型(对DBMS的用户和实现者都有优势)解耦;

  3. 性能是物理存储细节(实现)和逻辑表示的直接后果(Eggermont的评论是 logical-physical confusion )。

以任何方式;一个是自由地实现元组和关系一个看合适。关系不一定的文件和元组不一定是文件的记录。



不幸的是,基于SQL的DBMS实现 太频繁了,哑的直接图像实现,并且在各种情况下性能不佳 - OLAP / ETL 产品以弥补这些缺点。



这是慢慢变化。有商业和自由软件/开源实现,最终避免这个根本的陷阱:





当然,必须存在一个最优的物理存储设计,但是任何物理存储设计都可以通过基于关系代数/结构​​的好的声明性语言来抽象化(而SQL是一个坏的示例)或更直接的逻辑编程语言(如Prolog,例如 - 请参阅我的答案 prolog到SQL转换器问题)。一个好的DBMS应该基于数据访问统计(和/或用户提示)来实时改变物理存储设计。



最后,在Eggermont的评论中,语句是另一个废话,但我不能在这里反驳,这个评论已经太长了。


Im currently working on a site which will contain a products catalog. I am a little new to database design so I'm looking for advice on how best to do this. I am familiar with relational database design so I understand "many to many" or "one to many" etc (took a good db class in college). Here is an example of what an item might be categorized as:

Propeller -> aircraft -> wood -> brand -> product.

Instead of trying to write what I have so far, just take a quick look at this image I created from the phpmyadmin designer feature.

alt text http://www.usfultimate.com/temp/db_design.jpg

Now, this all seemed fine and dandy, until I realized that the category "wood" would also be used under propeller -> airboat -> (wood). This would mean, that "wood" would have to be recreated every time I want to use it under a different parent. This isn't the end of the world, but I wanted to know if there is a more optimal way to go about this.

Also, I am trying to keep this thing as dynamic as possible so the client can organize his catalog as his needs change.

*Edit. Was thinking about just creating a "tags" table. So I could assign the tag "wood" or "metal" or "50inch" to 1 to many items. I would still keep a parenting type thing for the main categories, but this way the categories wouldnt have to go so deep and there wouldnt be the repetition.

解决方案

First, the user interface: as user I hate to search a product in a catalog organized in a strictly hierarchical way. I never remember in what sub-sub-sub-sub...-category an "exotic" product is in and this force me to waste time exploring "promising" categories just to discover it is categorized in a (for me, at least) strange way.

What Kevin Peno suggests is a good advice and is known as faceted browsing. As Marcia Bates wrote in After the Dot-Bomb: Getting Web Information Retrieval Right This Time, " .. faceted classification is to hierarchical classification as relational databases are to hierarchical databases. .. ".

In essence, faceted search allows users to search your catalog starting from whatever "facet" they prefer and let them filter information choosing other facets along the search. Note that, contrary to how tag systems are usually conceived, nothing prevents you to organize some of these facets hierarchically.

To quickly understand what faceted search is all about, there are some demos to explore at The Flamenco Search Interface Project - Search Interfaces that Flow.

Second, the application logic: what Manitra proposes is also a good advice (as I understand it), i.e. separating nodes and links of a tree/graph in different relations. What he calls "ancestor table" (which is a much better intuitive name, however) is known as transitive closure of a directed acyclic graph (DAG) (reachability relation). Beyond performance, it simplify queries greatly, as Manitra said.

But I suggest a view for such "ancestor table" (transitive closure), so that updates are in real-time and incremental, not periodical by a batch job. There is SQL code (but I think it needs to be adapted a little to specific DBMSes) in papers I mentioned in my answer to query language for graph sets: data modeling question. In particular, look at Maintaining Transitive Closure of Graphs in SQL (.ps - postscript).

Products-Categories relationship

The first point of Manitra is worth of emphasis, also.

What he is saying is that between products and categories there is a many-to-many relationship. I.e.: each product can be in one or more categories and in each category there can be zero or more products.

Given relation variables (relvars) Products and Categories such relationship can be represented, for example, as a relvar PC with at least attributes P# and C#, i.e. product and category numbers (identifiers) in a foreign-key relationships with corresponding Products and Categories numbers.

This is complementary to management of categories' hierarchies. Of course, this is only a design sketch.

On faceted browsing in SQL

A useful concept to implement "faceted browsing" is relational division, or, even, relational comparisons (see bottom of linked page). I.e. dividing PC (Products-Categories) by a (growing) list of categories chosen from a user (facet navigation) one obtains only products in such categories (of course, categories are presumed not all mutually exclusive, otherwise choosing two categories one will obtain zero products).

SQL-based DBMS usually lack this operators (division and comparisons), so I give below some interesting papers that implement/discuss them:

and so on...

I will not go into details here but interaction between categories hierarchies and facet browsing needs special care.

A digression on "flatness"

I briefly looked at the article linked by Pras, Managing Hierarchical Data in MySQL, but I stopped reading after these few lines in the introduction:

Introduction

Most users at one time or another have dealt with hierarchical data in a SQL database and no doubt learned that the management of hierarchical data is not what a relational database is intended for. The tables of a relational database are not hierarchical (like XML), but are simply a flat list. Hierarchical data has a parent-child relationship that is not naturally represented in a relational database table. ...

To understand why this insistence on flatness of relations is just nonsense, imagine a cube in a three dimensional Cartesian coordinate system: it will be identified by 8 coordinates (triplets), say P1(x1,y1,z1), P2(x2,y2,z2), ..., P8(x8, y8, z8) [here we are not concerned with constraints on these coordinates so that they represent really a cube].

Now, we will put these set of coordinates (points) into a relation variable and we will name this variable Points. We will represent the relation value of Points as a table below:

Points|  x |  y |  z |
=======+====+====+====+
       | x1 | y1 | z1 |
       +----+----+----+
       | x2 | y2 | z2 |
       +----+----+----+
       | .. | .. | .. |
       | .. | .. | .. |
       +----+----+----+
       | x8 | y8 | z8 |
       +----+----+----+

Does this cube is being "flattened" by the mere act of representing it in a tabular way? Is a relation (value) the same thing as its tabular representation?

A relation variable assumes as values sets of points in a n-dimensional discrete space, where n is the number of relation attributes ("columns"). What does it mean, for a n-dimensional discrete space, to be "flat"? Just nonsense, as I wrote above.

Don't get me wrong, It is certainly true that SQL is a badly designed language and that SQL-based DBMSes are full of idiosyncrasies and shortcomings (NULLs, redundancy, ...), especially the bad ones, the DBMS-as-dumb-store type (no referential constraints, no integrity constrains, ...). But that has nothing to do with relational data model fantasized limitations, on the contrary: more they turn away from it and worse is the outcome.

In particular, the relational data model, once you understand it, poses no problem in representing whatever structure, even hierarchies and graphs, as I detailed with references to published papers mentioned above. Even SQL can, if you gloss over its deficiencies, missing something better.

On the "The Nested Set Model"

I skimmed the rest of that article and I'm not particularly impressed by such logical design: it suggests to muddle two different entities, nodes and links, into one relation and this will probably cause awkwardness. But I'm not inclined to analyze that design more thoroughly, sorry.


EDIT: Stephan Eggermont objected, in comments below, that " The flat list model is a problem. It is an abstraction of the implementation that makes performance difficult to achieve. ... ".

Now, my point is, precisely, that:

  1. this "flat list model" is a fantasy: just because one lay out (represents) relations as tables ("flat lists") does not mean that relations are "flat lists" (an "object" and its representations are not the same thing);
  2. a logical representation (relation) and physical storage details (horizontal or vertical decompositions, compression, indexes (hashes, b+tree, r-tree, ...), clustering, partitioning, etc.) are distinct; one of the points of relational data model (RDM) is to decouple logical from "physical" model (with advantages to both users and implementors of DBMSes);
  3. performance is a direct consequence of physical storage details (implementation) and not of logical representation (Eggermont's comment is a classic example of logical-physical confusion).

RDM model does not constraint implementations in any way; one is free to implement tuples and relations as one see fit. Relations are not necessarily files and tuples are not necessarily records of a file. Such correspondence is a dumb direct-image implementation.

Unfortunately SQL-based DBMS implementations are, too often, dumb direct-image implementations and they suffer poor performance in a variety of scenarios - OLAP/ETL products exist to cover these shortcomings.

This is slowly changing. There are commercial and free software/open source implementations that finally avoid this fundamental pitfall:

Of course, the point is not that there must exist an "optimal" physical storage design, but that whatever physical storage design can be abstracted away by a nice declarative language based on relational algebra/calculi (and SQL is a bad example) or more directly on a logic programming language (like Prolog, for example - see my answer to "prolog to SQL converter" question). A good DBMS should be change physical storage design on-the-fly, based on data access statistics (and/or user hints).

Finally, in Eggermont's comment the statement " The relational model is getting squeeezed between the cloud and prevayler. " is another nonsense but I cannot give a rebuttal here, this comment is already too long.

这篇关于需要数据库结构建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆