如何使用MySQL存储此分层数据? [英] How do I store this hierarchical data using MySQL?

查看:96
本文介绍了如何使用MySQL存储此分层数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在设计一个将被许多企业使用的Web应用程序.但是,我在决定如何存储数据时遇到了麻烦.数据的一般结构在此树中得到了展示: http://i.imgur.com/lpYwqya.png

I am currently designing a web application which will be used by many businesses. However, I am having trouble deciding how to store the data. The general structure of the data is demonstrated in this tree : http://i.imgur.com/lpYwqya.png

因此将有一个表列出每个客户.每个客户都有自己的用户和项目.每个项目都有两个子级:用户和任务.用户是指在客户端下注册的,有权访问该项目的用户(将存储该用户的ID,以及其权限[读/写]).对于树的每个级别,我都需要存储数据.例如,任务具有以下字段(WBS,名称,开始日期,完成日期,持续时间,工作,成本,固定成本,供应商...)

So there will be a table that lists every client. Each client has its own users and projects. Each project has two children: users and tasks. Users refers to the users registered under the client who are allowed to access that project (will store the id of that user, and their permission [read/write]) For each level of the tree, I need to store data. For instance, a task has the following fields (WBS, Name, Start Date, Finish Date, Duration, Work, Cost, Fixed Cost, Vendor, ...)

我在决定如何最好地构造数据方面遇到困难.请注意,将始终从树的顶部向下访问数据(从父级到子级),并且我永远不必在子级之间移动或备份树.这是我提出的两个解决方案:

I am having difficulty deciding how to best structure the data. Note that the data will always be accessed from the top of the tree down (parents to children), and I never have to move across children or back up the tree. Here are two solutions I have come up with:

解决方案1 ​​:拥有无限数量的表格.每次创建客户端时,也会创建两个表:1_projects和1_users(其中1是第一个表中客户端的ID).创建项目后,将创建表1_1_tasks,依此类推.因此,ID为5,任务ID为3895,项目ID为19,客户ID为57658的风险的计划表将为:57658_19_3895_5_plans.

Solution 1: Have an unlimited number of tables. Every time a client is created, two tables are also created: 1_projects and 1_users (where 1 is the id of the client in the first table). When a project is created, a table 1_1_tasks will be created, and so on. So the plan table for a risk with id 5, task id 3895, project id 19, and client id 57658 would be: 57658_19_3895_5_plans.

解决方案2 :具有9个表:客户,用户,项目,project_users,任务,风险,risk_updates,计划,plan_updates.在风险表中,除了每个风险都与之相关的字段之外,它还将具有以下内容:client_id,project_id,task_id.因此,例如,如果要返回客户针对特定任务的所有风险,我会在整个树中搜索其中client_id =#,project_id =#,task_id =#的风险.当然,这些字段将构成风险表的复合/复合键.因此,风险表将存储每个项目,每个客户的每个任务的风险.最后一个表plan_updates显然很大.

Solution 2: Have 9 tables: clients, users, projects, project_users, tasks, risks, risk_updates, plans, plan_updates. In the risks table, in addition to the fields that every risk has associated with it, it will also have the following: client_id, project_id, task_id. So, for example, if I want to return every risk that a client has for a particular task, I search the entire tree for risks where client_id = #, project_id = #, task_id = #. Of course, these fields would form a composite/compound key for the risk table. So, the risk table would store the risks for every task, from every project, from every client. The last table, plan_updates, would obviously be massive.

我认为解决方案1会很强大,因为它可以使我轻松地向下浏览树,因为不属于同一父级的节点不会存储在同一表中.但是,此解决方案也很糟糕,因为将有大量的表,因此以后对数据库进行任何修改都是非常困难的.

I believe solution 1 to be strong because it allows me easily navigate down the tree because nodes that do not belong to the same parent are not stored in the same table. However, this solution is also very bad because there will be a massive number of tables, and so any later modifications to the database would be very difficult.

解决方案2强大,因为所有风险都集中在一个表中.但是,我想知道在搜索诸如plan_updates表时是否效率很低,因为我将不得不在整个表(将是巨大的)中搜索与所有父元素的ID匹配的字段.

Solution 2 is strong because all risks are centralized in one table. However, I wonder whether it will be very inefficient when searching say, the plan_updates table because I will have to search the entire table (which will be massive) for fields that match the id's of all parent elements.

要全面了解这一点,我期望以下几点:

To put this all into perspective, I anticipate the following:

用户:每个客户1-20.通常小于5.

Users: 1-20 per client. Usually less than 5.

项目:每个客户1-100.大多数将小于20.

Projects: 1-100 per client. Most will be less than 20.

任务:每个项目100-10,000.

Tasks: 100-10,000 per project.

风险:每个任务0-10.不过,只有大约30%的任务会有风险,而大多数任务只有1-4个风险.

Risks: 0-10 per task. Only around 30% of tasks will have risks though, and the majority of these will only have 1-4 risks.

风险更新:每种风险1-10.

Risk Updates: 1-10 per risk.

计划:每项风险1-5.

Plans: 1-5 per risk.

计划更新:每个计划1-10.

Plan Updates: 1-10 per plan.

如果有人可以阐明我如何才能最好地解决这个问题,那将非常有帮助.

If anyone could shed some light on how I could best solve this problem, that would be very helpful.

推荐答案

第二种解决方案对我来说似乎更合理.第一个解决方案的最大缺陷是整个结构的可管理性差.您很快就会得到大量的表,并且如果结构发生更改(需要添加额外的字段或额外的约束),您将会遇到麻烦.

The second solution seems much more reasonable to me. The biggest flaw in the first solution would be the poor manageability of the whole structure. You will very soon end up with a massive number of tables and in case of a structure change (an extra field or an extra constraint needs to be added) you will have trouble.

另一方面,您对复合键的担心并不那么严重.

Your concerns for compound keys are not that serious on the other hand.

例如,任务可以单独分配给各个项目.他们也不需要直接参考客户.另一方面,很可能您会在某个时候引入另一个n-n链接表,该表直接连接用户和任务,以便定义谁来执行该特定任务.

Tasks for example can be assigned to individual projects alone. There is no need for them to have a reference directly to client too. On the other hand it is very likely that you will at some point introduce another n-n link table connecting the users and tasks directly in order to define who is to carry out that particular task.

因此,如果要列出任务的所有风险,则首先必须找到手头的任务,然后使用单个键(任务ID)扫描风险表.无论您有一个表还是多个表,都保持不变.

So, if you want to list all the risks of a task you will first have to find the task at hand and then use a single key (the task id) to scan the risks table. That remains the same whether you have one or multiple tables.

我强烈建议您选择第2部分,并确保您标识了所有相关的主键和索引(以及适用时的唯一列).这将使数据库快速高效.

I strongly suggest you choose soution #2 and make sure you identify all the relevant primary keys and indexes (and unique columns where applicable). That will make the database fast and efficient.

修改

正如@MSW所提到的,关于该主题还有很多要说的.有无数有关该主题的数据库设计文献(具有诸如正态性,原子性等原理).

As @MSW mentions there is a whole lot more to be said about the subject. There is endless literature about database design (with principles like normality, atomicity ...) that covers the subject.

还有另一点解释了解决方案#1的质量很差的情况,那就是在以后的情况下,您将不容易对所有项目进行分析,因为它们都属于大型项目不同表的数量.

One further point that explains the poor quality of solution #1 would also be that at a later point you will not easily be able to do analyses across various projects since they will all be in a large number of different tables.

这篇关于如何使用MySQL存储此分层数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆