从表创建嵌套数组的最佳方法:多个查询/循环VS单个查询/循环样式 [英] Best way to create nested array from tables: multiple queries/loops VS single query/loop style

查看:70
本文介绍了从表创建嵌套数组的最佳方法:多个查询/循环VS单个查询/循环样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有2张桌子, 我可以合并"并在单个嵌套数组中表示.

考虑到以下因素,我在徘徊什么是最好的方法:

  • 效率
  • 最佳做法
  • 数据库/服务器端使用权衡
  • 您在现实生活中应该做什么
  • 3个,4个或更多表可以以这种方式合并"的情况相同

问题与任何服务器端/关系数据库有关.

我正在思考的2种简单方法 (如果您还有其他人,请提出建议! 注意我要一个简单的SERVER-SIDE和RELATIONAL-DB , 所以请不要浪费你的时间解释为什么我不应该 使用此类数据库,使用MVC设计等,等等...)

  1. 2个循环,5个简单的"SELECT"查询
  2. 1个循环,1个"JOIN"查询

我试图给出一个简单而详细的例子, 为了解释自己&更好地了解您的答案 (尽管如何编写代码和/或 发现可能的错误不是这里的问题, 所以不要专注于此...)

用于在表中创建和插入数据的SQL脚本

CREATE TABLE persons
(
    id int NOT NULL AUTO_INCREMENT,
    fullName varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO persons (fullName) VALUES ('Alice'), ('Bob'), ('Carl'), ('Dan');

CREATE TABLE phoneNumbers
(
    id int NOT NULL AUTO_INCREMENT,
    personId int,
    phoneNumber varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO phoneNumbers (personId, phoneNumber) VALUES ( 1, '123-456'), ( 1, '234-567'), (1, '345-678'), (2, '456-789'), (2, '567-890'), (3, '678-901'), (4, '789-012');  

合并"后的表的JSON表示形式:

[
  {
    "id": 1,
    "fullName": "Alice",
    "phoneNumbers": [
      "123-456",
      "234-567",
      "345-678"
    ]
  },
  {
    "id": 2,
    "fullName": "Bob",
    "phoneNumbers": [
      "456-789",
      "567-890"
    ]
  },
  {
    "id": 3,
    "fullName": "Carl",
    "phoneNumbers": [
      "678-901"
    ]
  },
  {
    "id": 4,
    "fullName": "Dan",
    "phoneNumbers": [
      "789-012"
    ]
  }
]

两种方式的伪代码:

1.

query: "SELECT id, fullName FROM persons"
personList = new List<Person>()
foreach row x in query result:
    current = new Person(x.fullName)
    "SELECT phoneNumber FROM phoneNumbers WHERE personId = x.id"
    foreach row y in query result:
        current.phoneNumbers.Push(y.phoneNumber)
    personList.Push(current)        
print personList         

2.

query: "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId"
personList = new List<Person>()
current = null
previouseId = null
foreach row x in query result:
    if ( x.id !=  previouseId )
        if ( current != null )
            personList.Push(current)
            current = null
        current = new Person(x.fullName)
    current.phoneNumbers.Push(x.phoneNumber)
print personList            

PHP/MYSQL中的代码实现:

1.

/* get all persons */
$result = mysql_query("SELECT id, fullName FROM persons"); 
$personsArray = array(); //Create an array
//loop all persons
while ($row = mysql_fetch_assoc($result))
{
    //add new person
    $current = array();
    $current['id'] = $row['id'];
    $current['fullName'] = $row['fullName'];

    /* add all person phone-numbers */
    $id = $current['id'];
    $sub_result = mysql_query("SELECT phoneNumber FROM phoneNumbers WHERE personId = {$id}");
    $phoneNumbers = array();
    while ($sub_row = mysql_fetch_assoc($sub_result))
    {
        $phoneNumbers[] = $sub_row['phoneNumber']);
    }
    //add phoneNumbers array to person
    $current['phoneNumbers'] = $phoneNumbers;

    //add person to final result array
    $personsArray[] = $current;
}

echo json_encode($personsArray);

2.

/* get all persons and their phone-numbers in a single query */
$sql = "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId";
$result = mysql_query($sql); 

$personsArray = array();
/* init temp vars to save current person's data */
$current = null;
$previouseId = null;
$phoneNumbers = array();
while ($row = mysql_fetch_assoc($result))
{
    /*
       if the current id is different from the previous id:
       you've got to a new person.
       save the previous person (if such exists),
       and create a new one
    */
    if ($row['id'] != $previouseId )
    {
        // in the first iteration,
        // current (previous person) is null,
        // don't add it
        if ( !is_null($current) )
        {
            $current['phoneNumbers'] = $phoneNumbers;
            $personsArray[] = $current;
            $current = null;
            $previouseId = null;
            $phoneNumbers = array();
        }

        // create a new person
        $current = array();
        $current['id'] = $row['id'];
        $current['fullName'] = $row['fullName'];
        // set current as previous id
        $previouseId = $current['id'];
    }

    // you always add the phone-number 
    // to the current phone-number list
    $phoneNumbers[] = $row['phoneNumber'];
    }
}

// don't forget to add the last person (saved in "current")
if (!is_null($current))
    $personsArray[] = $current);

echo json_encode($personsArray);

P.S. 此链接是一个不同问题的示例,在这里我尝试建议第二种方法:解决方案

初步

首先,感谢您付出大量的精力来解释问题和进行格式化.很高兴见到一个清楚自己在做什么和在问什么的人.

但是必须注意,这本身就构成了局限性:您被固定为这样的观念,即这是正确的解决方案,并且只需进行一些小的更正或指导,便可以奏效.那是不对的.因此,我必须请您放弃该概念,向后退一步,并查看(a)整个问题和(b)没有该概念的答案.

此答案的上下文是:

  • 您给出的所有明确考虑因素都很重要,我将不再重复

  • 其中两个最重要的是,最佳实践在现实生活中我会做什么

此答案源于最佳实践"的标准,即最佳实践"的高阶标准或参考框架.这是商业客户/服务器界正在做或应该做的事情.

这个问题,整个问题空间,正在成为一个普遍的问题.我将在这里进行全面考虑,从而也回答另一个SO问题.因此,它可能包含您需要的更多细节.如果是这样,请原谅.

注意事项

  1. 数据库是基于服务器的资源,被许多用户共享.在在线系统中,数据库不断变化.它包含每个事实的一个版本的真相(不同于一个地方的一个事实,这是一个单独的规范化问题).

    • 某些数据库系统没有服务器体系结构,因此这种软件中的 server 概念是错误和误导的,这是相互独立但值得注意的要点.
  2. 据我了解,出于性能原因",需要JSON和类似JSON的结构,正是因为服务器"不能,不能充当服务器.这个概念是在每个(每个)客户端上缓存数据,这样您就不必一直从服务器"中获取数据.

    • 这会打开一罐蠕虫.如果您没有正确设计和实现此功能,蠕虫将使应用程序超载.

    • 这样的实现严重违反了客户端/服务器体系结构,该体系结构允许双方都使用简单的代码,并且可以适当地部署软件和数据组件,因此实现时间短且效率高. /p>

    • 此外,这样的实现需要大量的实现工作,并且它很复杂,由许多部分组成.这些部分中的每一个都必须经过适当的设计.

    • 网络以及该主题领域中的许多书籍提供了令人困惑的方法组合,这些方法是在假定的简单性的基础上推销的.舒适;任何人都可以做的事;免费软件可以做任何事情;等等.任何这些提议都是没有科学依据的.

非体系结构和不合格

正如所证明的,您已经了解到某些数据库设计方法是不正确的.您遇到了一个 one 问题,一个 one 实例,该建议是错误的.一旦解决了这个问题,就会暴露出目前尚不明显的下一个问题.这些概念是一系列永无止境的问题.

我不会列举有时会提倡的所有错误观念.我相信,随着您对我的回答的进行,您会注意到一个接一个的市场观念是错误的.

两条底线是:

  1. 这些概念违反了体系结构和设计标准,即客户端/服务器体系结构; 开放式体系结构 ;工程原理;而在这个特定问题中,较少的是数据库设计原则.

  2. 这会导致像您这样的人,他们试图做一个诚实的工作,被欺骗去实现简单的概念,然后又变成了大规模的实现.永远无法实现的实现,因此需要进行大量的持续维护,最终将被批发替换.

建筑

被违反的中心原则是,永远不要复制任何东西.一旦您拥有了要复制数据的位置(由于缓存或复制或两个单独的单片应用程序等),您就创建了一个副本,在在线情况下不同步.因此,原则是避免这样做.

  • 当然,对于严肃的第三方软件(例如过时的报告工具),根据设计,它们可能会在客户端中很好地缓存基于服务器的数据.但是请注意,在考虑到上述情况的前提下,他们已经投入了数百个人年来正确实施它.您的软件不是这样的.

除了提供关于必须理解的原理或每个错误的弊端和代价的讲座以外,该答案的其余部分还提供了所要求的您在现实生活中会做什么,正确的架构方法(比最佳实践高出一步).

架构1

请勿混淆

  • 必须规范化的数据

使用

  • 结果集,根据定义,该结果集是数据的扁平化视图(去规范化"不是很正确).

假设数据已归一化,将包含重复值;重复组.结果集包含重复的值;重复组.那是行人.

  • 请注意,嵌套集(或嵌套关系)的概念在我看来并不是很好的建议,它正是基于这种混淆.

  • RM 出现以来的四十五年来,他们一直无法将基础关系(适用 适用的基础关系)与派生关系区分开. (不适用于规范化 ).

  • 这些支持者中有两个正在质疑第一范式"的定义. 1NF是其他NF的基础,如果接受新定义,则所有NF都将失去价值.结果是,规范化本身(用数学术语进行了稀疏定义,但被专业人员清楚地理解为一门科学)将受到严重破坏,甚至没有被破坏.

建筑2

有一个古老的科学或工程原理,内容(数据)必须与控制(程序元素)分开.这是因为两者的分析,设计和实现是完全不同的.该原则在软件科学中同样重要,因为它具有特定的表达方式.

为了保持简短(哈哈),而不是话语,我假设您理解:

  • 在数据和程序元素之间存在科学上要求的边界.将它们混合在一起会导致容易出错且难以维护的复杂对象.

    • 在OO/ORM世界中,这一原则的混乱已达到流行病的程度,其后果已广为流传.

    • 只有专业人员可以避免这种情况.对于其余的大多数人,他们接受新定义为正常",并毕生致力于解决我们根本没有的问题.

  • 根据E F Codd博士的关系模型,以 Tabular形式的形式存储和呈现数据的架构优势,即物超所值.有针对数据规范化的特定规则.

  • 重要的是,您可以确定撰写和销售书籍的人员何时建议非关系或反关系方法.

架构3

如果您在客户端上缓存数据:

  1. 缓存绝对最小值.

    这意味着仅缓存在线环境中未更改的数据.这意味着仅引用表和查找表,即填充较高级别分类器的表,下拉列表等.

  2. 货币

    对于要缓存的每个表,您必须具有一种方法(a)与服务器上存在的真理的一个版本"相比,确定缓存的数据已过时,并且(b)从服务器,(c)逐表.

    通常,这涉及一个每隔(e)分钟执行一次的后台进程,该进程每隔五分钟执行一次,该查询针对客户端上的每个缓存表与服务器上的DateTime查询MAX更新的DateTime,并且如果更改,则刷新表,其所有子表(依赖于已更改表的子表).

    当然,这要求每个表上都有一个UpdatedDateTime列.这不是负担,因为无论如何您都需要进行OLTP ACID事务(如果您有一个真正的数据库,而不是一堆不合标准的文件).

这实际上意味着(永远不会重复)编码负担太高了.

架构4

在非商业性的非商业环境中,我了解到有人建议对一切"进行反向缓存.

  • 这是诸如PostgreSQL之类的程序可以在多用户系统中使用的唯一方法.

  • 您总是能得到所要付出的一切:您付给花生,猴子给您;您支付零,您得到零.

如果要在客户端上缓存数据,则体系结构3的必然结果是不要缓存经常更改的表.这些是交易记录和历史记录表.在客户端上缓存此类表或所有表的想法完全破产了.

在真正的客户端/服务器部署中,由于使用适用的标准,因此对于每个数据窗口,应用程序应基于上下文或过滤器值仅在特定时间在特定时间查询特定需求所需的行.等等.该应用程序永远不要加载整个表格.

如果同一用户使用同一窗口检查了其内容,则在第一次检查后15分钟,数据将过期15分钟.

  • 对于免费软件/共享软件/蒸气软件平台,该平台通过不存在服务器体系结构来定义自身,因此,结果是,不存在性能,因此,您肯定需要缓存比最小表更多的内容.客户.

  • 如果执行此操作,则必须考虑以上所有内容并正确实施,否则您的应用程序将被破坏,其后果将驱使用户寻求您的终止.如果有不止一个用户,他们将有相同的原因,并很快组成一支军队.

架构5

现在我们要来如何在客户端上缓存那些经过精心选择的表.

请注意,数据库会增长,它们会得到扩展.

  • 如果系统发生故障,则故障将以很小的增量增长,并且需要大量的精力.

  • 如果系统即使很小成功,它也会成倍增长.

  • 如果系统(数据库和应用程序分别独立)的设计和实施良好,则更改将很容易,而错误将很少.

因此,必须正确设计应用程序中的所有组件,以符合适用的标准,并且数据库必须完全规范化.反过来,这可以最小化数据库中的更改对应用程序的影响,反之亦然.

  • 该应用程序将由简单但不复杂的对象组成,这些对象易于维护和更改.

  • 对于在客户端上缓存的数据,将使用某种形式的数组:OO平台中一个类的多个实例; 4GL中的DataWindows(TM,google). PHP中的简单数组.

(另外.请注意,像您这样的情况下的人们在一年之内生产的产品,使用商业SQL平台,商业4GL并符合体系结构和标准的专业提供商.)

架构6

因此,假设您已了解上述所有内容,并赞赏其价值,尤其是Architecture 1& 2.

  • 如果您不这样做,请在这里停止并提问,不要继续进行以下操作.

现在我们已经建立了完整的上下文,我们可以解决您的问题的症结.

  • 在应用程序中的那些数组中,为什么要在地球上存储展平的数据视图?

    • 并因此困扰并困扰着这些问题
  • 而不是存储标准化表的副本?

答案

  1. 请不要重复任何可以派生的内容.这是一种架构原则,不限于数据库中的规范化.

  2. 从不合并任何内容.

    如果这样做,您将创建:

      客户端上的
    • 数据重复及其大量.客户端不仅会发胖而且速度缓慢,还会使用重复数据的镇流器将其固定在地板上.

    • 附加代码,完全没有必要

    • 该代码中的
    • 复杂性

    • 易碎的代码,将不得不不断对其进行更改.

    这是您正在遭受的确切问题,这是该方法的结果,您凭直觉知道这是错误的,因此必须有更好的方法.您知道这是一个普遍且普遍的问题.

    还请注意,该方法(该代码)构成了您的心理支柱.查看格式化和精美呈现的方式:这对您很重要.我不愿意将所有这些情况告知您.

    • 由于您的认真和直率的态度,以及您没有发明这种方法的知识,很容易克服这种不情愿的感觉
  3. 在每个代码段中,在演示时以及根据需要:

    a.在商业客户端/服务器环境中
    执行一个查询,该查询联接简单的,标准化的,不重复的表,并且仅检索合格的行.从而获得当前数据值.用户永远不会看到过时的数据.在这里,经常使用Views(归一化数据的扁平化视图).

    b.在子商业非服务器环境中
    创建一个临时结果集数组,并加入简单,无重复的数组(已缓存的表的副本),并仅使用源数组中的合格行填充该数组.其货币由后台进程维护.

    • 使用键"在数组之间形成联接,与使用键"在数据库中的关系表中形成联接的方式完全相同.

    • 当用户关闭窗口时销毁这些组件.

    • 一个精巧的版本将消除结果集数组,并通过键将源数组联接起来,并将结果限制在合格的行中.

除了在结构上不正确之外,根本不需要嵌套数组或嵌套集或JSON或类似JSON的结构.这是混淆架构1原理的结果.

  • 如果您确实选择使用此类结构,请将它们用于临时结果集数组.

最后,我相信该论述表明 n 表不是问题.更重要的是,在数据层次结构深处的 m 级别(嵌套")不是问题.

答案2

现在,我已经给出了完整的上下文信息(而不是之前的内容),这消除了您的问题中的含义,并使之成为一个通用的核心问题.

问题与任何服务器端/关系数据库有关. [哪个更好]:

2个循环,5个简单的"SELECT"查询

1个循环,1个"JOIN"查询

以上给出的详细示例未正确描述.准确的描述是:

  • 您的选项1 2个循环,每个循环用于加载每个数组 每个循环1个单表SELECT查询 (执行了n x m次...最外面的循环仅是一次执行)

  • 您的选项2 1个联接的SELECT查询执行一次 其次是2个循环,每个循环用于加载每个数组

对于商用SQL平台,都不适用,因为它不适用.

  • 商用SQL Server是一个集处理引擎.将一个查询与任何需要的连接一起使用,这将返回一个结果集.永远不要使用循环来遍历行,这样会将集处理引擎简化为1970年前的ISAM系统.在服务器中使用视图,因为它提供了最高的性能,并且代码在一个地方.

但是,对于非商业,非服务器平台,其中:

  • 您的服务器"不是设置处理引擎,即它会返回单行,因此您必须手动

  • 获取每一行并填充数组
  • 您的服务器"没有提供客户端/服务器绑定,即.

根据您的示例,那么答案是,您的选择2很大.

请仔细考虑,并发表评论或提出问题.

回复评论

说我需要将此json(或其他html页面)打印到一些STOUT(示例:http响应:GET/allUsersPhoneNumbers.这只是一个示例,用于阐明我期望得到的内容),应该返回此json.我有一个php函数,得到了这2个结果集(1).现在它应该打印此json-我该怎么做?该报告可能是一个员工全年的月薪,等等.用另一种方法,我需要收集这些信息,并以"JOIN"形式表示

也许我还不够清楚.

  1. 基本上,除非绝对必要,否则请不要使用JSON.这意味着发送到需要它的某个系统,这意味着接收系统,并且该需求是愚蠢的.

  2. 确保您的系统没有对其他人提出这样的要求.

  3. 保持数据规范化.无论是在数据库中,还是在您编写的任何程序元素中.这意味着(在此示例中)每个表或数组使用一个SELECT.出于加载目的,因此您可以在程序中的任何位置引用和检查它们.

  4. 当您需要加入时,了解它是:

    • 结果集;派生关系视图
    • 因此是临时的,仅在该元素执行期间存在

    a.对于表,通过键以通常的方式将它们连接起来.一个查询,连接两个(或多个)表.

    b.对于数组,可以通过键在程序中联接数组,就像通过键在数据库中联接表一样.

  5. 对于您给出的示例(它是对某些请求的响应),首先要了解它是类别[4],然后实现它.

为什么还要考虑JSON? JSON与这有什么关系?

JSON被误解了,人们对哇的因素很感兴趣.这是寻找问题的解决方案.除非您有该问题,否则它没有任何价值. 检查这两个链接:
直升机-什么是JSON
StackOverflow-什么是JSON

现在,如果您了解这一点,则主要用于传入的提要.从不外向.此外,在使用之前需要进行解析,解构等操作.

回忆:

我需要收集此信息,并以"JOIN"形式表示

是的.那是行人.加入不是不是.

在您的示例中,接收方期望的是展平的视图(例如,电子表格),所有单元格都已填充,是的,对于具有多个PhoneNumber的用户,其用户详细信息将在第二个nad后续结果中重复显示-设置行.对于任何类型的print,例如.为了进行调试,我想要一个扁平化的视图.只是一个:

    SELECT ... FROM Person JOIN PhoneNumber

然后返回.或者,如果您满足了来自数组的请求,请加入Person和PhoneNumber数组(可能需要一个临时结果集数组),然后将其返回.

请不要告诉我您一次只能获得1个用户,等等.

正确.如果有人告诉您退回程序处理(即,在WHILE循环中逐行执行),而引擎或您的程序已经进行了集处理(即,在一个命令中处理了整个集),则将其标记为应不被听.

我已经说过,您的选项2是正确的,选项1是不正确的.就GET或SELECT而言.

另一方面,对于不具有集合处理功能(即无法在单个命令中打印/设置/检查数组)或不提供客户端数组绑定的服务器"的编程语言,您必须编写循环,数据层次结构的每个深度一个循环(在您的示例中,两个循环,一个循环用于Person,一个循环用于每个用户的PhoneNumber).

  • 您必须这样做才能解析传入的JSON对象.
  • 您必须这样做才能从Option 2中返回的结果集中加载每个数组.
  • 您必须执行此操作才能从Option 2中返回的结果集中打印每个数组.

回应评论2

我必须返回一个以嵌套版本表示的结果(假设我将报表打印到页面上),json只是这种表示的一个示例.

我认为您不理解我在此答案中提供的理由和结论.

  • 对于打印和显示,永远不要嵌套. 打印展平视图,即按选项2从SELECT返回的行.这是我们在打印或显示关系"数据的31年中一直在做的事情.它更易于阅读,调试,搜索,查找,折叠,装订,切割.您不能带有嵌套数组的任何事情,只能看着它,说出有趣的东西.

代码

注意事项

我希望获取您的代码并对其进行修改,但是实际上,从您的代码来看,它的编写或结构不够完善,因此无法进行合理的修改.其次,如果我使用它,那将是一个糟糕的教学工具.因此,我必须给您新鲜,干净的代码,否则您将不会学习正确的方法.

此代码示例遵循我的建议,因此不再重复.这远远超出了最初的问题.

  • 查询&打印

    您的请求,使用您的Option2.一次执行一次SELECT.紧接着是一个循环.如果愿意,可以修饰".

Say I have 2 tables, which I can "merge" and represent in a single nested array.

I'm wandering what would be the best way to do that, considering:

  • efficiency
  • best-practice
  • DB/server-side usage trade-off
  • what you should do in real life
  • same case for 3, 4 or more tables that can be "merged" that way

The question is about ANY server-side/relational-db.

2 simple ways I was thinking about (if you have others, please suggest! notice I'm asking for a simple SERVER-SIDE and RELATIONAL-DB, so please don't waste your time explaining why I shouldn't use this kind of DB, use MVC design, etc., etc. ...):

  1. 2 loops, 5 simple "SELECT" queries
  2. 1 loop, 1 "JOIN" query

I've tried to give a simple and detailed example, in order to explain myself & understand better your answers (though how to write the code and/or finding possible mistakes is not the issue here, so try not to focus on that...)

SQL SCRIPTS FOR CREATING AND INSERTING DATA TO TABLES

CREATE TABLE persons
(
    id int NOT NULL AUTO_INCREMENT,
    fullName varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO persons (fullName) VALUES ('Alice'), ('Bob'), ('Carl'), ('Dan');

CREATE TABLE phoneNumbers
(
    id int NOT NULL AUTO_INCREMENT,
    personId int,
    phoneNumber varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO phoneNumbers (personId, phoneNumber) VALUES ( 1, '123-456'), ( 1, '234-567'), (1, '345-678'), (2, '456-789'), (2, '567-890'), (3, '678-901'), (4, '789-012');  

A JSON REPRESENTATION OF THE TABLES AFTER I "MERGED" THEM:

[
  {
    "id": 1,
    "fullName": "Alice",
    "phoneNumbers": [
      "123-456",
      "234-567",
      "345-678"
    ]
  },
  {
    "id": 2,
    "fullName": "Bob",
    "phoneNumbers": [
      "456-789",
      "567-890"
    ]
  },
  {
    "id": 3,
    "fullName": "Carl",
    "phoneNumbers": [
      "678-901"
    ]
  },
  {
    "id": 4,
    "fullName": "Dan",
    "phoneNumbers": [
      "789-012"
    ]
  }
]

PSEUDO CODE FOR 2 WAYS:

1.

query: "SELECT id, fullName FROM persons"
personList = new List<Person>()
foreach row x in query result:
    current = new Person(x.fullName)
    "SELECT phoneNumber FROM phoneNumbers WHERE personId = x.id"
    foreach row y in query result:
        current.phoneNumbers.Push(y.phoneNumber)
    personList.Push(current)        
print personList         

2.

query: "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId"
personList = new List<Person>()
current = null
previouseId = null
foreach row x in query result:
    if ( x.id !=  previouseId )
        if ( current != null )
            personList.Push(current)
            current = null
        current = new Person(x.fullName)
    current.phoneNumbers.Push(x.phoneNumber)
print personList            

CODE IMPLEMENTATION IN PHP/MYSQL:

1.

/* get all persons */
$result = mysql_query("SELECT id, fullName FROM persons"); 
$personsArray = array(); //Create an array
//loop all persons
while ($row = mysql_fetch_assoc($result))
{
    //add new person
    $current = array();
    $current['id'] = $row['id'];
    $current['fullName'] = $row['fullName'];

    /* add all person phone-numbers */
    $id = $current['id'];
    $sub_result = mysql_query("SELECT phoneNumber FROM phoneNumbers WHERE personId = {$id}");
    $phoneNumbers = array();
    while ($sub_row = mysql_fetch_assoc($sub_result))
    {
        $phoneNumbers[] = $sub_row['phoneNumber']);
    }
    //add phoneNumbers array to person
    $current['phoneNumbers'] = $phoneNumbers;

    //add person to final result array
    $personsArray[] = $current;
}

echo json_encode($personsArray);

2.

/* get all persons and their phone-numbers in a single query */
$sql = "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId";
$result = mysql_query($sql); 

$personsArray = array();
/* init temp vars to save current person's data */
$current = null;
$previouseId = null;
$phoneNumbers = array();
while ($row = mysql_fetch_assoc($result))
{
    /*
       if the current id is different from the previous id:
       you've got to a new person.
       save the previous person (if such exists),
       and create a new one
    */
    if ($row['id'] != $previouseId )
    {
        // in the first iteration,
        // current (previous person) is null,
        // don't add it
        if ( !is_null($current) )
        {
            $current['phoneNumbers'] = $phoneNumbers;
            $personsArray[] = $current;
            $current = null;
            $previouseId = null;
            $phoneNumbers = array();
        }

        // create a new person
        $current = array();
        $current['id'] = $row['id'];
        $current['fullName'] = $row['fullName'];
        // set current as previous id
        $previouseId = $current['id'];
    }

    // you always add the phone-number 
    // to the current phone-number list
    $phoneNumbers[] = $row['phoneNumber'];
    }
}

// don't forget to add the last person (saved in "current")
if (!is_null($current))
    $personsArray[] = $current);

echo json_encode($personsArray);

P.S. this link is an example of a different question here, where i tried to suggest the second way: tables to single json

解决方案

Preliminary

First, thank you for putting that much effort into explaining the problem, and for the formatting. It is great to see someone who is clear about what they are doing, and what they are asking.

But it must be noted that that, in itself, forms a limitation: you are fixed on the notion that this is the correct solution, and that with some small correction or guidance, this will work. That is incorrect. So I must ask you to give that notion up, to take a big step back, and to view (a) the whole problem and (b) my answer without that notion.

The context of this answer is:

  • all the explicit considerations you have given, which are very important, which I will not repeat

  • the two most important of which is, what best practice and what I would do in real life

This answer is rooted in Standards, the higher order of, or frame of reference for, best practice. This is what the commercial Client/Server world does, or should be doing.

This issue, this whole problem space, is becoming a common problem. I will give a full consideration here, and thus answer another SO question as well. Therefore it might contain a tiny bit more detail that you require. If it does, please forgive this.

Consideration

  1. The database is a server-based resource, shared by many users. In an online system, the database is constantly changing. It contains that One Version of the Truth (as distinct from One Fact in One Place, which is a separate, Normalisation issue) of each Fact.

    • the fact that some database systems do not have a server architecture, and that therefore the notion of server in such software is false and misleading, are separate but noted points.
  2. As I understand it, JSON and JSON-like structures are required for "performance reasons", precisely because the "server" doesn't, cannot, perform as a server. The concept is to cache the data on each (every) client, such that you are not fetching it from the "server" all the time.

    • This opens up a can of worms. If you do not design and implement this properly, the worms will overrun the app.

    • Such an implementation is a gross violation of the Client/Server Architecture, which allows simple code on both sides, and appropriate deployment of software and data components, such that implementation times are small, and efficiency is high.

    • Further, such an implementation requires a substantial implementation effort, and it is complex, consisting of many parts. Each of those parts must be appropriately designed.

    • The web, and the many books written in this subject area, provide a confusing mix of methods, marketed on the basis of supposed simplicity; ease; anyone-can-do-anything; freeware-can-do-anything; etc. There is not scientific basis for any of those proposals.

Non-architecture & Sub-standard

As evidenced, you have learned that that some approaches to database design are incorrect. You have encountered one problem, one instance that that advice is false. As soon as you solve this one problem, the next problem, which is not apparent to you right now, will be exposed. The notions are a never-ending set of problems.

I will not enumerate all the false notions that are sometimes advocated. I trust that as you progress through my answer, you will notice that one after the other marketed notion is false.

The two bottom lines are:

  1. The notions violate Architecture and Design Standards, namely Client/Server Architecture; Open Architecture; Engineering Principles; and to a lesser in this particular problem, Database Design Principles.

  2. Which leads to people like you, who are trying to do an honest job, being tricked into implementing simple notions, which turn into massive implementations. Implementations that will never quite work, so they require substantial ongoing maintenance, and will eventually be replaced, wholesale.

Architecture

The central principle being violated is, never duplicate anything. The moment you have a location where data is duplicated (due to caching or replication or two separate monolithic apps, etc), you create a duplicate that will go out of synch in an online situation. So the principle is to avoid doing that.

  • Sure, for serious third-party software, such as a gruntly report tool, by design, they may well cache server-based data in the client. But note that they have put hundreds of man-years into implementing it correctly, with due consideration to the above. Yours is not such a piece of software.

Rather than providing a lecture on the principles that must be understood, or the evils and costs of each error, the rest of this answer provides the requested what would you do in real life, using the correct architectural method (a step above best practice).

Architecture 1

Do not confuse

  • the data which must be Normalised

with

  • the result set, which, by definition, is the flattened ("de-normalised" is not quite correct) view of the data.

The data, given that it is Normalised, will not contain duplicate values; repeating groups. The result set will contain duplicate values; repeating groups. That is pedestrian.

  • Note that the notion of Nested Sets (or Nested Relations), which is in my view not good advice, is based on precisely this confusion.

  • For forty-five years since the advent of the RM, they have been unable to differentiate base relations (for which Normalisation does apply) from derived relations (for which Normalisation does not apply).

  • Two of these proponents are currently questioning the definition of First Normal Form. 1NF is the foundation of the other NFs, if the new definition is accepted, all the NFs will be rendered value-less. The result would be that Normalisation itself (sparsely defined in mathematical terms, but clearly understood as a science by professionals) will be severely damaged, if not destroyed.

Architecture 2

There is a centuries-old scientific or engineering principle, that content (data) must be separated from control (program elements). This is because the analysis, design, and implementation of the two are completely different. This principle is no less important in the software sciences, where it has specific articulation.

In order to keep this brief (ha ha), instead of a discourse, I will assume that you understand:

  • That there is a scientifically demanded boundary between data and program elements. Mixing them up results in complex objects that are error-prone and hard to maintain.

    • The confusion of this principle has reached epidemic proportions in the OO/ORM world, the consequences reach far and wide.

    • Only professionals avoid this. For the rest, the great majority, they accept the new definition as "normal", and they spend their lives fixing problems that we simply do not have.

  • The architectural superiority, the great value, of data being both stored and presented in Tabular Form per Dr E F Codd's Relational Model. That there are specific rules for Normalisation of data.

  • And importantly, you can determine when the people, who write and market books, advise non-relational or anti-relational methods.

Architecture 3

If you cache data on the client:

  1. Cache the absolute minimum.

    That means cache only the data that does not change in the online environment. That means Reference and Lookup tables only, the tables that populate the higher level classifiers, the drop-downs, etc.

  2. Currency

    For every table that you do cache, you must have a method of (a) determining that the cached data has become stale, compared to the One Version of the Truth which exists on the server, and (b) refreshing it from the server, (c) on a table-by-table basis.

    Typically, this involves a background process that executes every (e) five minutes, that queries the MAX updated DateTime for each cached table on the client vs the DateTime on the server, and if changed, refreshes the table, and all its child tables, those that dependent on the changed table.

    That, of course, requires that you have an UpdatedDateTime column on every table. That is not a burden, because you need that for OLTP ACID Transactions anyway (if you have a real database, instead of a bunch of sub-standard files).

Which really means, never replicate, the coding burden is prohibitive.

Architecture 4

In the sub-commercial, non-server world, I understand that some people advise the reverse caching of "everything".

  • That is the only way the programs like PostgreSQL, can to the used in a multi-user system.

  • You always get what you pay for: you pay peanuts, you get monkeys; you pay zero, you get zero.

The corollary to Architecture 3 is, if you do cache data on the client, do not cache tables that change frequently. These are the transaction and history tables. The notion of caching such tables, or all tables, on the client is completely bankrupt.

In a genuine Client/Server deployment, due to use of applicable standards, for each data window, the app should query only the rows that are required, for that particular need, at that particular time, based on context or filter values, etc. The app should never load the entire table.

If the same user using the same window inspected its contents, 15 minutes after the first inspection, the data would be 15 mins out of date.

  • For freeware/shareware/vapourware platforms, which define themselves by the absence of a server architecture, and thus by the result, that performance is non-existent, sure, you have to cache more than the minimum tables on the client.

  • If you do that, you must take all the above into account, and implement it correctly, otherwise your app will be broken, and the ramifications will drive the users to seek your termination. If there is more than one user, they will have the same cause, and soon form an army.

Architecture 5

Now we get to how you cache those carefully chosen tables on the client.

Note that databases grow, they are extended.

  • If the system is broken, a failure, it will grow in small increments, and require a lot of effort.

  • If the system is even a small success, it will grow exponentially.

  • If the system (each of the database, and the app, separately) is designed and implemented well, the changes will be easy, the bugs will be few.

Therefore, all the components in the app must be designed properly, to comply with applicable standards, and the database must be fully Normalised. This in turn minimises the effect of changes in the database, on the app, and vice versa.

  • The app will consist of simple, not complex, objects, which are easy to maintain and change.

  • For the data that you do cache on the client, you will use arrays of some form: multiple instances of a class in an OO platform; DataWindows (TM, google for it) or similar in a 4GL; simple arrays in PHP.

(Aside. Note that what people in situations such as yours produce in one year, professional providers using a commercial SQL platform, a commercial 4GL, and complying with Architecture and Standards.)

Architecture 6

So let's assume that you understand all the above, and appreciate its value, particularly Architecture 1 & 2.

  • If you don't, please stop here and ask questions, do not proceed to the below.

Now that we have established the full context, we can address the crux of your problem.

  • In those arrays in the app, why on Earth would you store flattened views of data ?

    • and consequently mess with, and agonise over, the problems
  • instead of storing copies of the Normalised tables ?

Answer

  1. Never duplicate anything that can be derived. That is an Architectural Principle, not limited to Normalisation in a database.

  2. Never merge anything.

    If you do, you will be creating:

    • data duplication, and masses of it, on the client. The client will not only be fat and slow, it will be anchored to the floor with the ballast of duplicated data.

    • additional code, which is completely unnecessary

    • complexity in that code

    • code that is fragile, that will constantly have to change.

    That is the precise problem you are suffering, a consequence of the method, which you know intuitively is wrong, that there must be a better way. You know it is a generic and common problem.

    Note also that method, that code, constitutes a mental anchor for you. Look at the way that you have formatted it and presented it so beautifully: it is of importance to you. I am reluctant to inform you of all this.

    • Which reluctance is easily overcome, due to your earnest and forthright attitude, and the knowledge that you did not invent this method
  3. In each code segment, at presentation time, as and when required:

    a. In the commercial Client/Server context
    Execute a query that joins the simple, Normalised, unduplicated tables, and retrieves only the qualifying rows. Thereby obtaining current data values. The user never sees stale data. Here, Views (flattened views of Normalised data) are often used.

    b. In the sub-commercial non-server context
    Create a temporary result-set array, and join the simple, unduplicated, arrays (copies of tables that are cached), and populate it with only the qualifying rows, from the source arrays. The currency of which is maintained by the background process.

    • Use the Keys to form the joins between the arrays, in exactly the same way that Keys are used to form the joins in the Relational tables in the database.

    • Destroy those components when the user closes the window.

    • A clever version would eliminate the result-set array, and join the source arrays via the Keys, and limit the result to the qualifying rows.

Separate to being architecturally incorrect, Nested Arrays or Nested Sets or JSON or JSON-like structures are simply not required. This is the consequence of confusing the Architecture 1 Principle.

  • If you do choose to use such structures, then use them only for the temporary result-set arrays.

Last, I trust this discourse demonstrates that n tables is a non-issue. More important, that m levels deep in the data hierarchy, the "nesting", is a non-issue.

Answer 2

Now that I have given the full context (and not before), which removes the implications in your question, and makes it a generic, kernel one.

The question is about ANY server-side/relational-db. [Which is better]:

2 loops, 5 simple "SELECT" queries

1 loop, 1 "JOIN" query

The detailed examples you have given are not accurately described above. The accurate descriptions is:

  • Your Option 1 2 loops, each loop for loading each array 1 single-table SELECT query per loop (executed n x m times ... the outermost loop, only, is a single execution)

  • Your Option 2 1 Joined SELECT query executed once followed by 2 loops, each loop for loading each array

For the commercial SQL platforms, neither, because it does not apply.

  • The commercial SQL server is a set-processing engine. Use one query with whatever joins are required, that returns a result set. Never step through the rows using a loop, that reduces the set-processing engine to a pre-1970's ISAM system. Use a View, in the server, since it affords the highest performance and the code is in one place.

However, for the non-commercial, non-server platforms, where:

  • your "server" is not a set-processing engine ie. it returns single rows, therefore you have to fetch each row and fill the array, manually or

  • your "server" does not provide Client/Server binding, ie. it does not provide facilities on the client to bind the incoming result set to a receiving array, and therefore you have to step through the returned result set, row by row, and fill the array, manually,

as per your example then, the answer is, by a large margin, your option 2.

Please consider carefully, and comment or ask questions.

Response to Comment

Say I need to print this json (or other html page) to some STOUT (example: an http response to: GET /allUsersPhoneNumbers. It's just an example to clarify what I'm expecting to get), should return this json. I have a php function that got this 2 result sets (1). now it should print this json - how should I do that? this report could be an employee month salary for a whole year, and so one. one way or anther, I need to gather this information and represent it in a "JOIN"ed representation

Perhaps I was not clear enough.

  1. Basically, do not use JSON unless you absolutely have to. Which means sending to some system that requires it, which means that receiving system, and that demand is stupid.

  2. Make sure that your system doesn't make such demands on others.

  3. Keep your data Normalised. Both in the database, and in whatever program elements that you write. That means (in this example) use one SELECT per table or array. That is for loading purposes, so that you can refer to and inspect them at any point in the program.

  4. When you need a join, understand that it is:

    • a result-set; a derived relation; a view
    • therefore temporary, it exists for the duration of the execution of that element, only

    a. For tables, join them in the usual manner, via Keys. One query, joining two (or more) tables.

    b. For arrays, join arrays in the program, the same way you join tables in the database, via Keys.

  5. For the example you have given, which is a response to some request, first understand that it is the category [4], and then fulfil it.

Why even consider JSON? What has JSON got to do with this?

JSON is misunderstood and people are interested in the wow factor. It is a solution looking for a problem. Unless you have that problem it has no value. Check these two links:
Copter - What is JSON
StackOverflow - What is JSON

Now if you understand that, it is mostly for incoming feeds. Never for outgoing. Further, it requires parsing, deconstructing, etc, before the can be used.

Recall:

I need to gather this information and represent it in a "JOIN"ed representation

Yes. That is pedestrian. Joined does not mean JSONed.

In your example, the receiver is expecting a flattened view (eg. spreadsheet), with all the cells filled, and yes, for Users with more than one PhoneNumber, their User details will be repeated on the second nad subsequent result-set row. For any kind of print, eg. for debugging, I want a flattened view. It is just a:

    SELECT ... FROM Person JOIN PhoneNumber

And return that. Or if you fulfil the request from arrays, join the Person and PhoneNumber Arrays, which may require a temporary result-set array, and return that.

please don't tell me you should get only 1 user at a time, etc. etc.

Correct. If someone tells you to regress to procedural processing (ie. row by row, in a WHILE loop), where the engine or your program has set processing (ie. processes an entire set in one command), that marks them as someone who should not be listened to.

I have already stated, your Option 2 is correct, Option 1 is incorrect. That is as far as the GET or SELECT is concerned.

On the other hand, for programming languages that do not have set-processing capability (ie. cannot print/set/inspect an array in a single command), or "servers" that do not provide client-side array binding, you do have to write loops, one loop per depth of the data hierarchy (in your example, two loops, one for Person, and one for PhoneNumber per User).

  • You have to do that to parse an incoming JSON object.
  • You have to do that to load each array from the result set that is returned in your Option 2.
  • You have to do that to print each array from the result set that is returned in your Option 2.

Response to Comment 2

I've ment I have to return a result represented in a nested version (let's say I'm printing the report to the page), json was just an example for such representation.

I don't think you understand the reasoning and the conclusions I have provided in this answer.

  • For printing and displaying, never nest. Print a flattened view, the rows returned from the SELECT per Option 2. That is what we have been doing, when printing or displaying data Relationally, for 31 years. It is easier to read, debug, search, find, fold, staple, mutilate. You cannot do anything with a nested array, except look at it, and say gee that is interesting.

Code

Caveat

I would prefer to take your code and modify it, but actually, looking at your code, it is not well written or structured, it cannot be reasonably modified. Second, if I use that, it would be a bad teaching tool. So I will have to give you fresh, clean code, otherwise you will not learn the correct methods.

This code examples follow my advice, so I am not going to repeat. And this is way beyond the original question.

  • Query & Print

    Your request, using your Option 2. One SELECT executed once. Followed by one loop. Which you can "pretty up" if you like.

这篇关于从表创建嵌套数组的最佳方法:多个查询/循环VS单个查询/循环样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆