发现存储特定面向对象数据结构的最佳方法 [英] Discovering the best approach to storing a specific object-oriented data structure

查看:161
本文介绍了发现存储特定面向对象数据结构的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一些奇妙的建议,和一个不眠之夜的兴奋,由于最终解决我的问题的可能性,我意识到,我还不是一个解决方案。所以,我在这里更详细地概述我的问题,希望有人知道实现这一点的最好方法。



重述(如果你还没有阅读上一篇文章):




  • 我从头构建PHP OOP框架(我没有选择)

  • 该框架需要以最有效的方式处理面向对象的数据。

  • 对象非常类似于严格写入的oop对象,因为它们是一个实例

  • 对象属性可以是基本类型(字符串,数字,bool),但也可以是一个对象实例或一个对象数组该数组必须是相同类型的对象的限制)



最终,支持面向文档的存储的存储引擎XML或JSON),其中对象本身有一个严格的结构。



而不是概述我到目前为止所做的尝试(我在上一篇文章中简要讨论过)要花这篇文章的其余部分解释,详细地,这是我想要做什么。这个职位将会很长(对不起!)。






要开始,我需要讨论一个术语,不得不介绍解决的一个最关键的问题,随附的要求集。我将此术语命名为持久性。我理解这个术语在处理对象数据库时有不同的含义,因此我可以对不同的术语提出建议。



持久性指的是独立的对象。我发现需要引入这个术语,当考虑从XML生成的数据结构(这是我必须能够做的)。在XML中,我们看到完全依赖于它们的父对象的对象,而我们还看到可以独立于父对象的对象。



下面的示例是一个示例XML文档,它符合某个结构(例如.wsdl文件)。每个对象类似于具有严格结构的类型。 每个对象都有一个id属性





在上面的例子中,我们看到两个用户。两者都有自己的地址对象在他们的地址属性下。但是,如果我们查看他们的favouriteBook属性,我们可以看到他们重用同一个Book对象实例。



因此,我们有一个Address对象,它是非持久性,因为它只与它的父对象(User)意味着它的实例只有在拥有User对象存在时才需要存在。然后是 persistent 的Book对象,因为它可以在多个位置使用,并且它的实例保持不变。



疯狂的想出一个术语,像这样,但是,我发现它非常简单的理解和实际使用。它最终将多对多,一对多,一对一,多对一的公式简化为一个简单的想法,我觉得使用嵌套数据工作得更好。



我在这里做了以上数据的图像表示:





根据我对持久性的定义,有一套规则来帮助理解它。这些规则如下:




  • 更新/创建


  • 非永久性对象始终 >创建对象的新实例,以确保它们始终使用非持久性实例(在任何给定时间,在多个地方都不存在两个非持久性实例)

  • 删除

  • 基本对象的持久子对象不会被递归删除。这是因为持久对象可能存在于其他地方。

  • 基础对象的非持久性子对象与基础对象一起删除。


  • 如果没有删除, >因为持久化主要定义了修改的工作原理,检索不涉及持久化,除了你将如何期望持久化来实现模型如何存储,以及如何检索它(持久对象实例在任何位置都保持持久化,非持久性对象总是有自己的实例)



在继续之前要注意的最后一点是 - 数据模型的持久性由模型本身而不是关系定义。最初,持久性是关系的一部分,但是当系统期望您知道模型的结构以及如何使用它们时,这是完全不必要的。最终,模型的每个模型实例都是持久的,或者不是。






现在,你可能开始看到疯狂背后的方法。虽然看起来这个解决方案的原因是能够围绕符合一组条件的客观数据构建存储系统,但它的设计实际上来自于希望能够存储类实例,和/或从客观的数据结构。



我写了一些伪类作为我想要产生的功能的例子。我已经注释了大多数方法,包括类型声明。



所以首先,这将是所有模型类将扩展的基类。这个类的目的是在模型类/对象和数据库/存储引擎之间创建一个层:

  ?php 
/ **
*这是所有模型扩展的基类。它包含在所有模型
*对象中有用的功能,例如crud动作,查找和crud事件管理。
*
* @author Donny Sutherland< donny@pixelpug.co.uk>
* @package Main
* @subpackage Sub
*
*类ORMModel
* /
class ORMModel {
/ **
*为了生成对象之间的关系,每个对象必须有一个id。这个函数作为对象的
*唯一标识符。它的模型类型(集合)中的每个对象都有自己的id。
*
* @var int
* /
public $ id;
/ **
*应用程序分配的内部属性。这是定义模型的持久性的地方。
*
* @var bool
* /
protected $ internal_isPersistent = true;
/ **
*应用程序分配的内部属性。这是模型属性的数组,以及它们的PHP类型。
*
*例如,User模型可能使用这样的:
* array(
id=>integer,
*username =>string,
*password=>string,
*address=>object,
*favouriteBook=&对象,
*allBooks=>数组
*)
*
* @var数组
* /
protected $ internal_propertyTypes = array();
/ **
*应用程序分配的内部属性。这是一个模型属性的数组,它是对象,
*对象的MODEL CLASS类型。
*
*例如,属性类型的User模型示例可能使用:
* array(
*address=>Address,
favouriteBook=>Book,
*allBooks=>Book
*)
*
* @var array
*
protected $ internal_objectTypes = array();
/ **
*我不是100%肯定最好的使用方法,我已经尝试了几种不同的方式,似乎都导致
*性能问题。但最终,在我们尝试更新对象之前,我们将它当前存储的
*实例保存到此属性,允许我们比较旧对象和新对象。我觉得这非常有用的检测一个
*属性是否已经改变,我只需要制定出最好的方法做到这一点。
*
* @var $ this
* /
protected $ internal_old;

/ **
*构建一个空模型对象(所有NULL值)的惰性方法
*
* @return $ this
*
final public static function constructEmpty(){

}

/ **
这个方法被其他constructFromXXX方法使用,已被转换为PHP数组。
*此方法允许我们在ORM系统中构建一个RESTful接口,因为它符合以下
*规则:
*
* - 如果id被设置不为null),首先从存储中拉出对象。
* - 对于每个键=>传递数组的值,OVERWRITE的值
* - 对于属性为模型对象/数组,如果属性被赋值给数组:
* - 如果数组值为NULL,我们将清除对象关系
* - 如果数组valus不为null,此时构造递归结构
*
*最终,如果你在数组中分配一个传递给这个方法的属性,覆盖该值。如果
*你不会,它将使用存储中的属性值。
*
* @param array $ array
*
* @return $ this
* /
final public static function constructFromArray(array $ array){

}

/ **
*此方法尝试将$ json的值解码为PHP数组。然后,如果字符串
*可以解码,则调用constructFromArray。
*
* @param $ json
*
* @return $ this
* /
final public static function constructFromJson($ json){

}

/ **
*此方法尝试将$ xml的值解码为PHP数组。然后,如果xml
*可以解码,则调用constructFromArray。
*
* @param $ xml
*
* @return $ this
* /
final public static function constructFromXml($ xml){

}

/ **
*根据一组选项查找一个对象。
*
* @param ORMCrudOptions $ options
*
* @return $ this
* /
final public static function findOne(ORMCrudOptions $ options){

}

/ **
*基于一组选项查找所有对象(可选)
*
* @param ORMCrudOptions $ options
*
* @return $ this []
* /
final public static function findAll(ORMCrudOptions $ options = null){

}

/ **
*基于一组optoins查找对象的数量
*
* @param ORMCrudOptions $ options
*
* @return integer
* /
final public static function findCount(ORMCrudOptions $ options){

}

/ **
*查找一个对象,基于它的id,和(可选)一组选项。
*
* @param ORMCrudOptions $ options
*
* @return $ this
* /
final public static function findById($ id,ORMCrudOptions $ options = null){

}

/ **
*将此对象推送到存储。这将基于其id和
*持久性创建/更新所有包含的对象。
*
* @param ORMCrudOptions $ options
*
* @return bool
* /
final public function pushThis(ORMCrudOptions $ options){

}

/ **
*拉这个对象表单存储。这将根据它们的id和
*持久性再次检索所有包含的对象。
*
* @param ORMCrudOptions $ options
*
* @return bool
* /
final public function pullThis(ORMCrudOptions $ options){

}

/ **
*从存储中删除此对象。这有条件地删除基于其id的
*包含的对象(基于持久性)。
*
* @param ORMCrudOptions $ options
* /
final public function removeThis(ORMCrudOptions $ options){

}

/ **
*这是一个crud事件。
* /
public function beforeCreate(){

}

/ **
这是一个crud事件。
* /
public function afterCreate(){

}

/ **
这是一个crud事件。
* /
public function beforeUpdate(){

}

/ **
*这是一个crud事件。
* /
public function afterUpdate(){

}

/ **
*这是一个crud事件。
* /
public function beforeRemove(){

}

/ **
*这是一个crud事件。
* /
public function afterRemove(){

}

/ **
*这是一个crud事件。
* /
public function beforeRetrieve(){

}

/ **
*这是一个crud事件。
* /
public function afterRetrieve(){

}
}

最后,这个类将被设计为提供构造,查找,保存,检索和删除模型对象的功能。内部属性是仅存在于类中(不在存储中)的属性。这些属性由框架本身填充,而您使用界面创建模型,并添加属性/字段到模型。



这个框架是一个用于管理数据模型的接口。使用此界面创建模型,并向模型添加属性/字段。在这样做时,系统会为您自动创建类文件,在修改持久性和属性类型时更新这些内部属性。



为了保持开发人员的友好,系统为每个模型创建两个类文件。一个基类(扩展ORMModel)和另一个类(扩展基类)。基类由系统操作,因此不建议修改此文件。开发人员使用另一个类来为模型和crud事件添加附加功能。



现在回到示例数据,这里是User基类:

 <?php 
class User_Base extends ORMModel {
public $ name;
public $ pass;
/ **
* @var地址
* /
public $ address;
/ **
* @var预订
* /
public $ favouriteBook;

protected $ internal_isPersistent = true;
protected $ internal_propertyTypes = array(
id=>integer,
name=>string,
pass=>string ,
address=>object,
favouriteBook=>object
);
protected $ internal_objectTypes = array(
address=>Address,
favouriteBook=>Book
);
}

再次注意,内部属性由系统生成,因此这些数组将基于您在模型管理界面中创建/修改User模型时指定的属性/字段生成。还要注意docblock上的地址和 favouriteBook 属性定义。



这将是为User模型生成的另一个类:

 <?php 
final类用户扩展User_Base {
public function beforeCreate(){

}

public function afterCreate(){

}

public function beforeUpdate(){

}

public function afterUpdate(){

}

public function beforeRemove(){

}

public function afterRemove(){

}

public function beforeRetrieve(){

}

public function afterRetrieve

}
}

我们已经扩展了基类以创建另一个类,开发人员将添加其他方法,并向crud事件添加功能。



我不会添加其他对象组成示例数据的其余部分。因为上面应该解释它们的样子。



所以你可能没有注意到在ORMModel类中,CRUD方法需要一个ORMCrudOptions类的实例。这个类对整个系统非常重要,所以让我们快速看一下:

 <?php 

/ **
*尽管这个对象是一些什么聚合,它很可能是ORM的最重要的部分,因为它
*定义如何执行CRUD操作,概述如何完成查询。
*
* Class ORMCrudOptions
* /
final class ORMCrudOptions {
/ **
*这最终构成了sql的where部分查询。然而,因为我们希望能够在模型的层次结构中的任何深度查询
*,这变得相当复杂。
*
*以前,我开发了一个系统,允许用户做这样的事情:
*
*this.customer.address.postcode LIKE('%XXX% ')OR this.customer.address.line1 LIKE('%XXX%')
*
*他this和。是我对基本sql的扩展,this你找到的基本模型,
*和每个。基本上深入到层次结构中,以便对包含模型对象的
*内的某个属性进行比较
*
*我会解释更多的我在我的职位,这是我做的这一点,我绝对看看我如何可以更好地实现这
*虽然。
*
* @var string
* /
private $ query;
/ **
*这允许你通过定义建立一个顺序列表
*
*使用orderBy方法,可以通过如下语句链接顺序:
*
* - > orderBy(this.name,asc) - > orderBy(this.customer.address.line1 ,desc)
*
*这将类似于:
*
* ORDER BY this_name ASC,this_customer_address.line1 DESC
*
* @var array
* /
private $ orderBy;
/ **
*这允许您通过执行以下操作来设置限制开始和限制值:
*
* - > limit(10,10)
*
*这将类似于:
*
* LIMIT 10,10
*
* @var
* /
private $限制;
/ **
*深度添加在我的后来en吞食尝试和帮助性能。它允许您指定检索数据的
*的深度。虽然这有助于优化很多,我真的不喜欢不得不使用
*实现这,因为它看起来像一个解决方案。我宁愿能够在其他地方提高性能
*对象总是在其全深度检索
*
* @var整数
* /
private $ depth ;
/ **
*这是另一个新增功能。每当对模型执行crud操作时,如果这是真的,模型实例存储在本地高速缓存中
*,和/或如果此值为true,则从此高速缓存检索模型实例。
*
*我确实发现这在性能上有明显的提高,虽然它带来了复杂性,使
*系统棘手的使用有时。你真的需要了解如何和何时使用缓存,否则它可以
*是令人愤怒的。
*
* @var bool
* /
private $ useCache;
/ **
*内置于ORM系统中,并与应用程序绑定在一起,我设置了一个webhook系统,它触发
* crud事件中的webhooks。我发现需要能够禁用webhooks有时(当做大量的crud
*动作在一个时间)很早。设置为false基本上禁用webhooks上的crud操作
*
* @var bool
* /
private $ fireWebhooks;
/ **
*同时构建到应用程序中,并且绑定到ORM系统中是一个访问系统。这工作在一个单独的
*层到数据库,允许我使用相同的访问系统,我用于框架中的一切,因为我做
*用于定义crud操作访问。但是,在某些情况下,我发现禁用访问检查很有用。
*
*默认情况下始终打开。在我构建访问数据模型的api系统中,您无法修改此属性
*,因此总是进行访问检查。
*
* @var
* /
private $ ignoreAccessChecks;

/ **
*以懒惰的方式创建一个新的选项实例。
*
* @return ORMCrudOptions
* /
public static function n(){
return new ORMCrudOptions();
}

/ **
*设置查询值
*
* @param $ query
*
* @ return $ this
* /
public function query($ query){
$ this-> query = $ query;

return $ this;
}

/ **
*添加orderby字段和方向
*
* @param $ field
* @param string $方向
*
* @return $ this
* @internal param array $ orderBy
*
* /
public function orderBy($ field,$ direction =asc){
$ this-> orderBy [] = array($ field,$ direction);

return $ this;
}

/ **
*设置限制开始和限制。
*
* @param $ limitResults
* @param null $ limitStart
*
* @return $ this
* /
public function limit($ limitResults,$ limitStart = null){
$ this-> limit = array($ limitResults,$ limitStart);

return $ this;
}

/ **
*设置检索的深度
*
* @param $ depth
*
* @return $ this
* /
public function depth($ depth){
$ this-> depth = $ depth;

return $ this;
}

/ **
*设置是否使用模型缓存
*
* @param $ useCache
*
* @return $ this
* /
public function useCache($ useCache){
$ this-> useCache = $ useCache;

return $ this;
}

/ **
*设置是否在紧急操作上触发webhooks
*
* @param $ fireWebhooks
*
* @return $ this
* /
public function fireWebhooks($ fireWebhooks){
$ this-> fireWebhooks = $ fireWebhooks;

return $ this;
}

/ **
*设置是否忽略访问权限检查
*
* @param $ ignoreAccessChecks
*
* @return $ this
* /
public function ignoreAccessChecks($ ignoreAccessChecks){
$ this-> ignoreAccessChecks = $ ignoreAccessChecks;

return $ this;
}
}

这个类背后的想法是删除在crud方法中有大量的参数,并且因为这些参数中的大多数可以在所有的crud方法中重用。注意查询属性上的注释,因为这是很重要的。






基础伪代码和想法背后是什么,我想努力做。最后,我将展示一些用户场景:

 <?php 
//最简单的方法存储用户
$ user = User :: constructEmpty();
//我们对数据库端的id值使用自动递增。所以通过不指定id,我们不是updaing,和
// id将被自动生成。在推送之后,系统将为我分配id
$ user-> name =bob;
$ user-> pass =bobpass;
//如果尚未构造的话,系统会为您自动构建子对象,因为
//它知道应该构造什么类型。所以我不需要手动构造地址对象!
$ user-> address-> line1 =awesome drive;
$ user-> address-> zip =90051;
//保存到存储,但不要触发webhooks并忽略访问检查。注意,当递归发生时,ORMCrudOptions对象
//也被传递给子对象,这意味着相同的选项被子对象继承
$ user-> pushThis(ORMCrudOptions :: n() - > ; fireWebhooks(false) - > ignoreAccessChecks(true));
echo $ user-> id; //这将显示自动生成的ID
echo $ user-> address-> id; //这将是地址对象的audo生成的id。

//接下来更新对象中的某些内容
$ user-> name =bob updated;
//因为我们现在知道对象有一个id值,它会更新现有的对象。记住用户
//对象是持久的!
$ user-> pushThis(ORMCrudOptions :: n() - > fireWebhooks(false) - > ignoreAccessChecks(true));
echo $ user-> id; //这将是与之前完全相同的id
echo $ user-> address-> id; //这将是一个新的ID!记住,地址对象不是持久的,意味着创建了一个新的
//实例,以确保它是infact非持久性的。系统处理清除松散的
//对象,虽然这是主要的问题之一

//通过user-> name找到上面的对象
$ user = User :: findOne(ORMCrudOptions :: n() - > query(this.name =('bob')));
if($ user){
echo $ user-> name; //如果名称为bob的用户存在,则输出bob
}

//通过address-> zip查找上述用户
$ user = User :: findOne(ORMCrudOptions :: n() - > query(this.address.zip =('90051')));
if($ user){
echo $ user-> address-> zip; //如果存在address-> zip90051的用户,则输出90051
}

//删除上述用户
$ user = User :: findById(1); //假设用户id的id为1
//向用户添加喜欢的书籍
$ user-> favouriteBook-> name =awesome book!
// update
$ user-> pushThis(ORMCrudOptions :: n() - > ignoreAccessChecks(true));
// remove
$ user-> removeThis(ORMCrudOptions :: n() - > ignoreAccessChecks(true));
//与持久性如何工作,这将删除用户和用户的地址(因为地址是非持久性)
//但将留下已创建的书籍未删除,因为书籍是持久的并且可以作为其他对象的子对象存在

//最后,从面向文档的方向构造
$ user = User :: constructFromArray(array(
user=> bob,
pass=>passbob,
address=> array(
line1=>awesome drive,
zip=>90051

));
//这将只根据内部属性定义的属性类型和对象类型来构造对象。
//在模型定义的属性中不存在的属性,但存在于数组中的属性将被忽略,因此在数组中有更多的
//属性应该在那里不重要
$ user-> pushThis(ORMCrudOptions :: n() - > ignoreAccessChecks(true));

//使用数组只更新用户对象的一个​​属性(这最终是如何构建ORM的api系统的)
$ user = User :: constructFromArray(array(
id=> 1,
user=>bob updated
));
echo $ user-> pass; //这将输出passbob,因为没有在数组中指定pass,它被拉到形式存储

这里不是真的可以显示,但是使得这个系统令人高兴的事情之一是如何生成类文件使他们非常友好的IDE(特别是,自动完成)。是的,一些老学校的开发人员将反对这种新的现代化的技术,但在一天结束,当你处理疯狂复杂的面向对象的数据结构,使用IDE帮助您拼写您的属性



如果你还在我身边,谢谢你的阅读。你可能想知道,你想要什么?



简而言之,我没有大量的文档/对象存储经验,并且在过去几天里我已经显示有技术这可以帮助我实现我想要做的是什么。我只是不是100%肯定,但我找到了正确的一个。我是否创建一个新的ORM,我可以有效地从现有的ORM中获得这个功能,我使用一个专用的对象/图形数据库吗?



我非常欢迎任何和所有建议!

解决方案

仍然觉得这是一个嵌套集算法,因为您的数据将始终适合层次结构。 Simple types (strings, integers, etc) have a hierarchy of depth 1, and an object expression like customer.address.postcode (from your related post) will have a hierarchy level for each component (3 in this case, with the corresponding string value stored in the outermost node).



It seems that this hierarchy can store different types, so you’d need to make a small change to the nested set algorithm. Rather than each node carrying class-specific (Address, User, etc) columns, you have a string reference to the type and an integer primary key to reference it. This means that you can’t use foreign key constraints for this part of your database, but that’s a small price to pay. (The reason for this is a single column cannot obey one of several constraints, it would have to obey them all. That said, you could probably do something clever with a pre-insert/pre-update trigger though).



So, if you were to use a Doctrine or Propel NestedSet behaviour, you would define tables thus:




  • Node

    • [nested set columns, done for you in an ORM]

    • name (varchar, records the element name e.g. customer)

    • is_persistent (bool)

    • table_name (varchar)

    • primary_key (integer)


  • Address

    • (Your usual columns, ditto any other table)




Now, we have an interesting property emerging here: when creating a hierarchy, you’ll see that the trivial values in the leaf nodes can be shared by virtue of our reference system. In fact, I am not entirely sure the is_persistent boolean is required: it is persistent (if I have understood your term correctly) by virtue of sharing external table rows, and non-persistent if it does not.



So, if customer1.address.postcode has a particular string value, you can get customer2.address.postcode to point to the same thing. When updating the version pointed to by the first expression, the second one will update \"automatically\" (because it resolves to the same table row).



The advantage here is that this will bolt onto Propel and Doctrine without much work, and without any core hacking at all. You’d need to do some work to convert an object/array to a hierarchy, but that’s probably not much code.






Addendum: let me explain my thinking a bit more in relation to the storage of nested elements. You say that you believe that you need to share a hierarchy at different levels in different places, but I am not so sure (and presently I think you need some encouragement not to build an excessively complicated system!). Let us look at an example, of a user having a favourite book.



To store it, we create these hierarchies:

user 
node level 1
points to user record containing id=1, name=bob, pass=bobpass
favouriteBook
node level 2
points to book record containing id=1, name=awesome book
author
node level 3
points to author record containing id=3, name=peter, pass=peterpass

Now, let’s say we have another user and want to share a different favourite book by the same author (i.e. we are sharing user.favouriteBook.author).

user 
node level 1
points to different user record containing id=100, name=halfer, pass=halferpass
favouriteBook
node level 2
points to different book record containing id=101, name=textbook
author
node level 3
points to same author record (id = 3)

How about two users who share the same favourite book? No problem (we additionally share user.favouriteBook):

user 
node level 1
points to different user record containing id=101, name=donny, pass=donnypass
favouriteBook
node level 2
points to previous book record (id=1)
author
node level 3
points to previous author record (id = 3)

One critique that could be made of this method is that if you make user.favouriteBook \"persistent\" (i.e. shared) then it should share user.favouriteBook.author automatically. This is because if two or more people like the same book, it will be by the same author(s) for all of them.



However, I noted in the comments why I think my explicit approach is better: the alternative might be a nested set of a nested set, which might get too complicated, and as yet I don’t think you’ve demonstrated you need that. The trade-off is that my approach needs a bit more storage, but I think that’s fine. You also have some more setting-up of objects, but if you have a single factory for this, and solidly unit test it, I don’t think you need to worry.



(I think my approach could be faster too, but it is harder to say without developing a prototype for both and measuring performance on real datasets).






Addendum 2, to clean up some of the comments discussions and preserve it as an answer in the context of the question.



To determine whether the suggestion I outline here is feasible, you’ll need to create a prototype. I would recommend using an existing nested set solution, such as Propel with the NestedSetBehaviour, though GitHub will have many other libraries you can try. Do not try to integrate this prototype into your own ORM at this stage, as the integration work will just be a distraction. At the moment you want to test the idea for feasibility, that’s all.


After some fantastic suggestions, and a sleepless night of excitement due to the possibility of finally having a solution to my problem, I realize that I'm still not quite at a solution. So, I am here to outline my problem in much more detail in hope that someone knows of the best way to achieve this.

To recap (if you haven't read the previous post):

  • I am constructing a PHP OOP framework from scratch (I have no choice in this matter)
  • The framework is required to handle object-oriented data in the most efficient way possible. It doesn't need to be lightning quick, it just needs to be the best possible solution to the problem
  • Objects very closely resemble strictly written oop objects, in that they are an instance of a specific class, which contains a strict set of properties.
  • Object properties can be basic types (strings, numbers, bools) but can also be one object instance, or an array of objects (with the restriction that the array must be objects of the same type)

Ultimately, a storage engine that supports document-oriented storage (similar to XML or JSON) where the objects themselves have a strict structure.

Instead of outlining what I have tried so far (I discussed this briefly in my previous post) I am going to spend the rest of this post explaining, in detail, what it is that I am trying to do. This post is going to be long (sorry!).


To get started, I need to discuss a terminology that I had to introduce to solve one of the most crucial problems that came with the set of requirements. I've named this terminology "persistence". I understand that this term does have a different meaning when dealing with object databases, and for this reason I am open to suggestions on a different term. But for now, let's move on.

Persistence

Persistence refers to the independence of an object. I found the need to introduce this terminology when considering the data structure being generated from XML (which is something that I had to be able to do). In XML, we see objects that are completely dependent on their parent, while we also see objects that can be independent of a parent object.

The below example is an example of an XML document, that conforms to a certain structure (for example, a .wsdl file). Each object resembles a type with a strict structure. Every object has an "id" property

In the above example, we see two users. Both have their own Address objects under their "address" property. However if we look at their "favouriteBook" property, we can see that they both re-use the same Book object instance. Also note that the books use the same author.

So we have the Address object which is non-persistent because it is only related to its parent object (the User) meaning that its instance only needs to exist while the owning User object exists. Then the Book object which is persistent because it can be used in multiple locations and its instance remains persistent.

At first, I felt a bit crazy for coming up with a terminology like this, however, I found it remarkably simple to understand and use practically. It ultimately condenses the the "many-to-many, one-to-many, one-to-one, many-to-one" formula into a simple idea that I felt worked much better with nested data.

I've made an image representation of the above data here:

With my definition of persistence, comes a set of rules to help in understanding it. These rules are as follows:

  • update/create
  • Persistent child objects of the base object being stored update the properties of the persistent object, ultimately updating its instance.
  • Non-persistent objects always create a new instance of the object to ensure that they always use a non-persistent instance (no two non-persistent instances exist in more than one place at any given time)
  • deleting
  • Persistent child objects of the base object do not get deleted recursively. This is because the persistent object may exist in other places. You would always delete a persistent object directly.
  • Non-persistent child objects of the base object are removed along with the base object. If they were not removed, they would be left stranded as their design requires that the have a parent.
  • retrieving
  • Since persistent mostly defines how modifications work, retrieval doesn't involve persistence a great deal, aside from how you would expect persistence to effect how a model is stored and therefore how it would be retrieved (persistent object instances remaining persistent wherever it is located, non-persistent objects always having their own instance)

One final thing to note before we move on - the persistence of data models is defined by the model itself rather than the relationship. Initially, the persistence was part of the relationship but this was completely unnecessary when the system expects that you know the structure of your models, and how they are used. Ultimately, every model instance of a model is either persistent, or it is not.


So taking a look at some code now, you might start to see the method behind the madness. Although it may seem that the reason for this solution is to be able to build a storage system around objective data conforming to set of conditions, it's design actually comes from wanting to be able to store class instances, and/or generate class instances from an objective data structure.

I have written some pseudo-classes as an example of the functionality that I am trying to produce. I have commented most methods, including type declarations.

So first, this would be the base class that all model classes would extend. The purpose of this class is to create a layer between the model class/object, and the database/storage engine:

<?php
/**
 * This is the base class that all models would extend. It contains the functionalities that are useful among all model
 * objects, such as crud actions, finding, and crud event management.
 *
 * @author Donny Sutherland <donny@pixelpug.co.uk>
 * @package Main
 * @subpackage Sub
 *
 * Class ORMModel
 */
class ORMModel {
    /**
     * In order to generate relationships between objects, every object MUST have an id. This functions as the object's
     * unique identifier. Each object in it's model type (collection) has it's own id.
     *
     * @var int
     */
    public $id;
    /**
     * Internal property assigned by the application. This is where the persistence of the model is defined.
     *
     * @var bool
     */
    protected $internal_isPersistent = true;
    /**
     * Internal property assigned by the application. This is an array of the model's properties, and their PHP type.
     *
     * For example, a User model might use something like this:
     * array(
        "id" => "integer",
     *  "username" => "string",
     *  "password" => "string",
     *  "address" => "object",
     *  "favouriteBook" => "object",
     *  "allBooks" => "array"
     * )
     *
     * @var array
     */
    protected $internal_propertyTypes = array();
    /**
     * Internal property assigned by the application. This is an array of the model's properties which are objects, and
     * the MODEL CLASS type of the object.
     *
     * For example, the User model example for the property types might use this:
     * array(
     *  "address" => "Address",
        "favouriteBook" => "Book",
     *  "allBooks" => "Book"
     * )
     *
     * @var array
     */
    protected $internal_objectTypes = array();
    /**
     * I am not 100% sure on the best way to use this yet, I have tried a few different ways and all seem to cause
     * performance problems. But ultimately, before we attempt to update an object, we cache it's currently stored
     * instance to this property, allowing us to compare old vs new. I find this really useful for detecting whether a
     * property has changed, I just need to work out the best way to do it.
     *
     * @var $this
     */
    protected $internal_old;

    /**
     * The lazy way to construct an empty model object (all NULL values)
     *
     * @return $this
     */
    final public static function constructEmpty() {

    }

    /**
     * This method is used by the other constructFromXXX methods once the data has been converted to a PHP array.
     * This method is what allows us to build a RESTful interface into the ORM system as it conforms to the following
     * rules:
     *
     * - if the id is set (not null), first pull the object from storage.
     * - For each key => value of the passed array, OVERWRITE the value
     * - For properties that are model objects/arrays, if the property is assiged to the array:
     *  - if the array value is NULL, we are clearing the object relationship
     *  - if the array valus is not null, construct recursively at this point
     *
     * Ultimately, if you assign a property in the array that you pass to this method, it will overwrite the value. If
     * you do not, it will use the property value in storage.
     *
     * @param array $array
     *
     * @return $this
     */
    final public static function constructFromArray(array $array) {

    }

    /**
     * This method attempts to decode the value of $json into a PHP array. It then calls constructFromArray if the string
     * could be decoded.
     *
     * @param $json
     *
     * @return $this
     */
    final public static function constructFromJson($json) {

    }

    /**
     * This method attempts to decode the value of $xml into a PHP array. It then calls constructFromArray if the xml
     * could be decoded.
     *
     * @param $xml
     *
     * @return $this
     */
    final public static function constructFromXml($xml) {

    }

    /**
     * Find one object, based on a set of options.
     *
     * @param ORMCrudOptions $options
     *
     * @return $this
     */
    final public static function findOne(ORMCrudOptions $options) {

    }

    /**
     * Find all objects, (optionally) based on a set of options
     *
     * @param ORMCrudOptions $options
     *
     * @return $this[]
     */
    final public static function findAll(ORMCrudOptions $options=null) {

    }

    /**
     * Find the count of objects, based on a set of optoins
     *
     * @param ORMCrudOptions $options
     *
     * @return integer
     */
    final public static function findCount(ORMCrudOptions $options) {

    }

    /**
     * Find one object, based on it's id, and (optionally) a set of options.
     *
     * @param ORMCrudOptions $options
     *
     * @return $this
     */
    final public static function findById($id,ORMCrudOptions $options=null) {

    }

    /**
     * Push this object to storage. This creates/updates all of the contained objects, based on their id's and
     * persistence.
     *
     * @param ORMCrudOptions $options
     *
     * @return bool
     */
    final public function pushThis(ORMCrudOptions $options) {

    }

    /**
     * Pull this object form storage. This retrieves all of the contained objects again, based on their id's and
     * persistence.
     *
     * @param ORMCrudOptions $options
     *
     * @return bool
     */
    final public function pullThis(ORMCrudOptions $options) {

    }

    /**
     * Remove this object from storage. This conditionally removes the contained objects (based on persistence) based
     * on their id's.
     *
     * @param ORMCrudOptions $options
     */
    final public function removeThis(ORMCrudOptions $options) {

    }

    /**
     * This is a crud event.
     */
    public function beforeCreate() {

    }

    /**
     * This is a crud event.
     */
    public function afterCreate() {

    }

    /**
     * This is a crud event.
     */
    public function beforeUpdate() {

    }

    /**
     * This is a crud event.
     */
    public function afterUpdate() {

    }

    /**
     * This is a crud event.
     */
    public function beforeRemove() {

    }

    /**
     * This is a crud event.
     */
    public function afterRemove() {

    }

    /**
     * This is a crud event.
     */
    public function beforeRetrieve() {

    }

    /**
     * This is a crud event.
     */
    public function afterRetrieve() {

    }
}

So ultimately, this class would be designed to provide the functionality to construct, find, save, retrieve and delete model objects. The internal properties are properties that exist only in the classes (not in storage). These properties get populated by the framework itself while you use an interface to create models, and add property/fields to the models.

The idea is, the framework comes with an interface for managing data models. With this interface you create the models, and add property/fields to the models. In doing so, the system automatically creates the class files for you, updating those internal properties as you modify the persistence and property types.

To keep things developer friendly, the system creates two class files for each model. A base class (which extends ORMModel) and another class (which extends the base class). The base class is manipulated by the system and therefore modifying this file is not recommended. The other class is used by developers to add additional functionality to models and crud events.

So coming back to the example data, here is the User base class:

    <?php
class User_Base extends ORMModel {
    public $name;
    public $pass;
    /**
     * @var Address
     */
    public $address;
    /**
     * @var Book
     */
    public $favouriteBook;

    protected $internal_isPersistent = true;
    protected $internal_propertyTypes = array(
        "id" => "integer",
        "name" => "string",
        "pass" => "string",
        "address" => "object",
        "favouriteBook" => "object"
    );
    protected $internal_objectTypes = array(
        "address" => "Address",
        "favouriteBook" => "Book"
    );
}

Pretty much self explanatory. Again note that the internal properties get generated by the system, so those arrays would be generated based on the property/fields that you specify when creating/modifying the User model in the model management interface. Also note the docblock on the address and favouriteBook property definitions. Those are also generated by the system making the classes very IDE friendly.

This would be the other class generated for the User model:

    <?php
final class User extends User_Base {
    public function beforeCreate() {

    }

    public function afterCreate() {

    }

    public function beforeUpdate() {

    }

    public function afterUpdate() {

    }

    public function beforeRemove() {

    }

    public function afterRemove() {

    }

    public function beforeRetrieve() {

    }

    public function afterRetrieve() {

    }
}

Again, pretty self explanatory. We've extended the base class to create another class where developers would add additional methods, and add functionality to the crud events.

I'll not add in the other objects that make up the rest of the example data. Since the above should explain how they would look.

So you may/may not have noticed that in the ORMModel class, the CRUD methods require an instance of an ORMCrudOptions class. This class is pretty crucial to the whole system, so lets take a quick look at that:

    <?php

/**
 * Despite this object being some-what aggregate, it it quite possibly the most important part of the ORM, in that it
 * defines how CRUD actions are executed, and outline how the querying is done.
 *
 * Class ORMCrudOptions
 */
final class ORMCrudOptions {
    /**
     * This ultimately makes up the "where" part of the sql query. However, because we want to be able to make querying
     * possible at any depth within the hierarchy of a model, this gets quite complicated.
     *
     * Previously, I developed a system which allowed the user to do something like this:
     *
     * "this.customer.address.postcode LIKE ('%XXX%') OR this.customer.address.line1 LIKE ('%XXX%')
     *
     * he "this" and the "." are my extension to basic sql. The "this" refers to the base model that you are finding,
     * and each "." basically drills down into the hierarchy to make a comparison on a property somewhere within a
     * contained model object.
     *
     * I will explain more how I did this in my post, I am most definitely looking at how I could better achieve this
     * though.
     *
     * @var string
     */
    private $query;
    /**
     * This allows you to build up a list of order by definitions.
     *
     * Using the orderBy method, you can chain up the order by statements like:
     *
     * ->orderBy("this.name","asc")->orderBy("this.customer.address.line1","desc")
     *
     * Which would be similar to doing:
     *
     * ORDER BY this_name ASC, this_customer_address.line1 DESC
     *
     * @var array
     */
    private $orderBy;
    /**
     * This allows you to set the limit start and limit values by doing:
     *
     * ->limit(10,10)
     *
     * Which would be similar to doing:
     *
     * LIMIT 10, 10
     *
     * @var
     */
    private $limit;
    /**
     * Depth was added in my later en devours to try and help with performance. It allows you to specify the depth at
     * which to retrieve data. Although this helped with optimisation a lot, I really disliked having to use
     * implement this because it seems like a work-around. I would rather be able to increase performance elsewhere so
     * that objects are always retrieved at their full depth
     *
     * @var integer
     */
    private $depth;
    /**
     * This was another newly added feature. Whenever you execute a crud action on a model, the model instance is stored
     * in a local cache if this is true, and/or retrieved from this cached if this value is true.
     *
     * I did find this to make a significant increase on performance, although it did bring in complications that make
     * the system tricky to use at times. You really need to understand how and when to use the cache, otherwise it can
     * be infuriatingly obtuse.
     *
     * @var bool
     */
    private $useCache;
    /**
     * Built into the ORM system, and tied in with the application I set up a webhook system which fires out webhooks on
     * crud events. I discovered the need to be able to disable webhooks at times (when doing large amounts of crud
     * actions in one go) pretty early on. Setting this to false basically disables webhooks on the crud action
     *
     * @var bool
     */
    private $fireWebhooks;
    /**
     * Also build into the application, and tied into the ORM system is an access system. This works on a seperate
     * layer to the database, allowing me to use the same access system as I use for everything in the framework as I do
     * for defining crud action access. However, in some instances I found it useful to disable access checks.
     *
     * This is always on by default. In the api system that I built to access the data models, you were not able to
     * modify this property and therefore were always subject to access checks.
     *
     * @var
     */
    private $ignoreAccessChecks;

    /**
     * The lazy way to create a new instance of options.
     *
     * @return ORMCrudOptions
     */
    public static function n() {
        return new ORMCrudOptions();
    }

    /**
     * Set the query value
     *
     * @param $query
     *
     * @return $this
     */
    public function query($query) {
        $this->query = $query;

        return $this;
    }

    /**
     * Add an orderby field and direction
     *
     * @param $field
     * @param string $direction
     *
     * @return $this
     * @internal param array $orderBy
     *
     */
    public function orderBy($field,$direction="asc") {
        $this->orderBy[] = array($field,$direction);

        return $this;
    }

    /**
     * Set the limit start and limit.
     *
     * @param $limitResults
     * @param null $limitStart
     *
     * @return $this
     */
    public function limit($limitResults,$limitStart=null) {
        $this->limit = array($limitResults,$limitStart);

        return $this;
    }

    /**
     * Set the depth for retrieval
     *
     * @param $depth
     *
     * @return $this
     */
    public function depth($depth) {
        $this->depth = $depth;

        return $this;
    }

    /**
     * Set whether to use the model cache
     *
     * @param $useCache
     *
     * @return $this
     */
    public function useCache($useCache) {
        $this->useCache = $useCache;

        return $this;
    }

    /**
     * Set whether to fire webhooks on crud actions
     *
     * @param $fireWebhooks
     *
     * @return $this
     */
    public function fireWebhooks($fireWebhooks) {
        $this->fireWebhooks = $fireWebhooks;

        return $this;
    }

    /**
     * Set whether to ignore access checks
     *
     * @param $ignoreAccessChecks
     *
     * @return $this
     */
    public function ignoreAccessChecks($ignoreAccessChecks) {
        $this->ignoreAccessChecks = $ignoreAccessChecks;

        return $this;
    }
}

The idea behind this class is to remove the need to have a large number of arguments in the crud methods, and because the majority of those arguments can be re-used in all of the crud methods. Take note of the comments on the query property, as that is one is important.


So, that pretty much covers the base psuedo-code and ideas behind what it is that I am trying to do. So finally, I'll show some user-scenarios:

<?php
//the most simple way to store a user
$user = User::constructEmpty();
//we use auto incrementing on the id value at the database end. So by not specifying the id, we are not updaing, and
//the id will be auto generated. After the push has been made, the system will assign the id for me
$user->name = "bob";
$user->pass = "bobpass";
//the system automatically constructs child objects for you if they are not yet constructed, because
//it knows what type should be constructed. So I don't need to construct the address object, manually!
$user->address->line1 = "awesome drive";
$user->address->zip = "90051";
//save to storage, but don't fire webhooks and ignore access checks. Note that the ORMCrudOptions object
//is passed to child objects too when recursion happens, meaning that the same options are inherited by child objects
$user->pushThis(ORMCrudOptions::n()->fireWebhooks(false)->ignoreAccessChecks(true));
echo $user->id; //this will display the auto generated id
echo $user->address->id; //this will be the audo generated id of the address object.

//next lets update something within the object
$user->name = "bob updated";
//because we know now that the object has an id value, it will update the existing object. Remembering tha the User
//object is persistent!
$user->pushThis(ORMCrudOptions::n()->fireWebhooks(false)->ignoreAccessChecks(true));
echo $user->id; //this will be the exact same id as before
echo $user->address->id; //this will be a NEW ID! Remember, the address object is NOT persistent meaning that a new
//instance was created in order to ensure that is is infact non-persistent. The system does handle cleaning up of loose
//objects although this is one of the main perforance problems

//finding the above object by user->name
$user = User::findOne(ORMCrudOptions::n()->query("this.name = ('bob')"));
if($user) {
    echo $user->name; //provided that a user with name "bob" exsists, this would output "bob"
}

//finding the above user by address->zip
$user = User::findOne(ORMCrudOptions::n()->query("this.address.zip = ('90051')"));
if($user) {
    echo $user->address->zip; //provided that the user with address->zip "90051" exists, this would output "90051"
}

//removing the above user
$user = User::findById(1); //assuming that the id of the user id 1
//add a favourite book to the user
$user->favouriteBook->name = "awesome book!";
//update
$user->pushThis(ORMCrudOptions::n()->ignoreAccessChecks(true));
//remove
$user->removeThis(ORMCrudOptions::n()->ignoreAccessChecks(true));
//with how persistence works, this will delete the user, and the user's address (because the address is non-persistence)
//but will leave the created book un-deleted, because books are persistent and may exist as child objects to other objects

//finally, constructing from document-oriented
$user = User::constructFromArray(array(
    "user" => "bob",
    "pass" => "passbob",
    "address" => array(
        "line1" => "awesome drive",
        "zip" => "90051"
    )
));
//this will only CONSTRUCT the object based on the internal properties defined property types and object types.
//properties that don't exist in the model's defined properties, but exist in the array will be ignored, so having more
//properties in the array than should be there doesn't matter
$user->pushThis(ORMCrudOptions::n()->ignoreAccessChecks(true));

//update only one property of a user object using arrays (this is ultimately how the api system of the ORM was built)
$user = User::constructFromArray(array(
    "id" => 1,
    "user" => "bob updated"
));
echo $user->pass; //this would output passbob, because the pass was not specified in the array, it was pulled form storage

It's not really possible to show here, but one of the things that makes this system a delight to use is how the the generation of the class files makes them incredibly IDE friendly (in particular, for auto-completion). Yeah, some of the old-school developers will be against this new-modern-fangled-technology, but at the end of the day when you are dealing with crazily complex object-oriented data structures, having the IDE help you in spelling your property names correctly and getting the structure correct can be a life-saver!

If you are still with me, thank you for reading. You are probably wondering though, what is it you want again?.

In short, I don't have a huge amount of experience in document/object storage and already in the past few days I've been shown that there are technologies out there that could help my achieve what it is that I am trying to do. I'm just not 100% certain yet that I have found the right one. Do I create a new ORM, can I efficiently get this functionality out of an existing ORM, do I use a dedicated object/graph database?

I very much welcome any and all suggestions!

解决方案

It still feels like this is a nested set algorithm, because your data will always fit into a hierarchy. Simple types (strings, integers, etc) have a hierarchy of depth 1, and an object expression like customer.address.postcode (from your related post) will have a hierarchy level for each component (3 in this case, with the corresponding string value stored in the outermost node).

It seems that this hierarchy can store different types, so you'd need to make a small change to the nested set algorithm. Rather than each node carrying class-specific (Address, User, etc) columns, you have a string reference to the type and an integer primary key to reference it. This means that you can't use foreign key constraints for this part of your database, but that's a small price to pay. (The reason for this is a single column cannot obey one of several constraints, it would have to obey them all. That said, you could probably do something clever with a pre-insert/pre-update trigger though).

So, if you were to use a Doctrine or Propel NestedSet behaviour, you would define tables thus:

  • Node
    • [nested set columns, done for you in an ORM]
    • name (varchar, records the element name e.g. customer)
    • is_persistent (bool)
    • table_name (varchar)
    • primary_key (integer)
  • Address
    • (Your usual columns, ditto any other table)

Now, we have an interesting property emerging here: when creating a hierarchy, you'll see that the trivial values in the leaf nodes can be shared by virtue of our reference system. In fact, I am not entirely sure the is_persistent boolean is required: it is persistent (if I have understood your term correctly) by virtue of sharing external table rows, and non-persistent if it does not.

So, if customer1.address.postcode has a particular string value, you can get customer2.address.postcode to point to the same thing. When updating the version pointed to by the first expression, the second one will update "automatically" (because it resolves to the same table row).

The advantage here is that this will bolt onto Propel and Doctrine without much work, and without any core hacking at all. You'd need to do some work to convert an object/array to a hierarchy, but that's probably not much code.


Addendum: let me explain my thinking a bit more in relation to the storage of nested elements. You say that you believe that you need to share a hierarchy at different levels in different places, but I am not so sure (and presently I think you need some encouragement not to build an excessively complicated system!). Let us look at an example, of a user having a favourite book.

To store it, we create these hierarchies:

user
node level 1
points to user record containing id=1, name=bob, pass=bobpass
    favouriteBook
    node level 2
    points to book record containing id=1, name=awesome book
        author
        node level 3
        points to author record containing id=3, name=peter, pass=peterpass

Now, let's say we have another user and want to share a different favourite book by the same author (i.e. we are sharing user.favouriteBook.author).

user
node level 1
points to different user record containing id=100, name=halfer, pass=halferpass
    favouriteBook
    node level 2
    points to different book record containing id=101, name=textbook
        author
        node level 3
        points to same author record (id = 3)

How about two users who share the same favourite book? No problem (we additionally share user.favouriteBook):

user
node level 1
points to different user record containing id=101, name=donny, pass=donnypass
    favouriteBook
    node level 2
    points to previous book record (id=1)
        author
        node level 3
        points to previous author record (id = 3)

One critique that could be made of this method is that if you make user.favouriteBook "persistent" (i.e. shared) then it should share user.favouriteBook.author automatically. This is because if two or more people like the same book, it will be by the same author(s) for all of them.

However, I noted in the comments why I think my explicit approach is better: the alternative might be a nested set of a nested set, which might get too complicated, and as yet I don't think you've demonstrated you need that. The trade-off is that my approach needs a bit more storage, but I think that's fine. You also have some more setting-up of objects, but if you have a single factory for this, and solidly unit test it, I don't think you need to worry.

(I think my approach could be faster too, but it is harder to say without developing a prototype for both and measuring performance on real datasets).


Addendum 2, to clean up some of the comments discussions and preserve it as an answer in the context of the question.

To determine whether the suggestion I outline here is feasible, you'll need to create a prototype. I would recommend using an existing nested set solution, such as Propel with the NestedSetBehaviour, though GitHub will have many other libraries you can try. Do not try to integrate this prototype into your own ORM at this stage, as the integration work will just be a distraction. At the moment you want to test the idea for feasibility, that's all.

这篇关于发现存储特定面向对象数据结构的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆