我应该使用哪种层次模型?邻接,嵌套还是枚举? [英] Which Hierarchical model should I use? Adjacency, Nested, or Enumerated?

查看:100
本文介绍了我应该使用哪种层次模型?邻接,嵌套还是枚举?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张表格,其中包含世界上所有地理位置的位置及其关系.

I have a table which contains a location of all geographical locations in the world and their relationships.

这里是显示层次结构的示例.您会看到数据实际上是全部存储为三个

Here is a example that shows the hierarchy. You will see that the data is actually stored as all three

  • 枚举路径
  • 邻接表
  • 嵌套集

数据显然也不会改变.以下是英格兰布赖顿(Brighton)地区的直接祖先的例子,该地区的悲惨经历是13911.

The data obviously never changes either. Below is an example of direct ancestors of the location Brighton in England which has a woeid of 13911.

表:geoplanet_places(具有560万行) 大图: http://tinyurl.com/68q4ndx

Table: geoplanet_places (Has 5.6million rows) Large Image: http://tinyurl.com/68q4ndx

然后我有了另一个名为entities的表.该表存储了我想映射到某个地理位置的项目.我存储了一些基本信息,但最重要的是,我存储了woeid,它是来自geoplanet_places的外键.

I then have another table called entities. This table stores my items which I would like to map to a geographical location. I store some basic information but most important I store the woeid which is a foreign key from geoplanet_places.

最终,entities表将包含数千个实体.我想要一种能够返回包含实体的所有节点的完整树的方法.

Eventually the entities table will contain several thousand entities. And I would like a way to be able to return a full tree of all of the nodes which contain entities.

我计划创建一些内容,以方便根据实体的地理位置过滤和搜索实体,并能够发现在该特定节点上可以找到多少个实体.

I plan on creating something to facilitate the filtering and searching of entities based on their geographical location and be able to discover how many entities can be found on that particular node.

因此,如果我的entities表中只有一个实体,则可能会有类似的内容

So if I only have one entity in my entities table, I might have something like this

`地球(1)

`Earth (1)

英国(1)

英国(1)

东萨塞克斯郡(1)

布莱顿霍夫(1)

布莱顿(1)`

然后让我们说我在德文郡有另一个实体,那么它将显示如下内容:

Lets then say that I have another entity which is located in Devon, then it would show something like:

地球(2)

Earth (2)

英国国王金(2)

英国(2)

德文(1)

东萨塞克斯郡(1) ...等等

East Sussex (1) ... etc

表示每个地理位置内部"有多少个实体的(计数)不需要存在.我可以忍受每小时生成我的对象并对其进行缓存.

The (Counts) which will say how many entities are "inside" of each geographical location do not need to be live. I can live with generating my object every hour and caching it.

目标是能够创建一个界面,以开始仅显示拥有实体的国家/地区.

The aim, is to be able to create an interface which might start out showing only the Countries which have entities..

就像

Argentina (1021)Chile (291)...United States (32,103)United Kingdom (12,338)

然后,用户将单击某个位置(例如United Kindom),然后将获得所有直接子节点,这些子节点是UK的后代,并且在其中具有实体.

Then the user will click on a location, such as United Kindom, and will then be given all of the immediate child nodes which are descendants of United Kingdom AND have an entity in them.

如果联合王国有32个县,但是最终在您向下钻取时只有23个县中存储有实体,那么我不想显示其他9个县.这只是位置.

If there are 32 Counties in United Kindgdom, but only 23 of them eventually when you drill down have entities stored in them, then I don't want to display the other 9. It is only locations.

此站点恰当地演示了我希望实现的功能: http://www.homeaway.com/vacation-rentals/europe/r5

This site aptly demonstrates the functionality that I wish to achieve: http://www.homeaway.com/vacation-rentals/europe/r5

您如何建议我管理这样的数据结构?

How do you recommend that I manage such a data structure?

我正在使用的东西.

  • PHP
  • MySQL
  • Solr

我计划尽可能快地进行钻取.我想创建一个AJAX界面,该界面对于搜索来说似乎是不可能的.

I plan on having the Drill downs be as rapid as possible. I want to create an AJAX interface that will be seemless for searching.

我也想知道您建议在哪些列上建立索引.

I would also be interested to know which columns you would recommend indexing on.

推荐答案

通常,层次结构中存在三种导致问题的查询:

Typically, there are three kinds of queries in the hierarchies which cause troubles:

  1. 归还所有祖先
  2. 返回所有后代
  3. 归还所有孩子(直系后代).

这是一张小表,显示了MySQL中不同方法的性能:

Here's a little table which shows the performance of different methods in MySQL:

                        Ancestors  Descendants  Children        Maintainability InnoDB
Adjacency list          Good       Decent       Excellent       Easy            Yes
Nested sets (classic)   Poor       Excellent    Poor/Excellent  Very hard       Yes
Nested sets (spatial)   Excellent  Very good    Poor/Excellent  Very hard       No
Materialized path       Excellent  Very good    Poor/Excellent  Hard            Yes

children中,poor/excellent表示答案取决于您是否将方法与邻接表(即i)混合使用. e.在每个记录中存储parentID.

In children, poor/excellent means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing the parentID in each record.

对于您的任务,您需要所有三个查询:

For your task, you need all three queries:

  1. 所有祖先展示地球/英国/德文郡的事物
  2. 所有要出示欧洲目的地"的儿童(以下各项)
  3. 所有后代均显示欧洲目的地"(计数)

我会走物化之路,因为这种等级制很少改变(仅在战争,起义等情况下).

I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).

创建一个名为path的varchar列,对其进行索引并用以下值填充它:

Create a varchar column called path, index it and fill it with the value like this:

1:234:6345:45454:

其中的数字是正确父母的主键,顺序正确(欧洲为1,英国为234等)

where the numbers are primary keys of the appropriate parents, in correct order (1 for Europe, 234 for UK etc.)

您还需要一个名为levels的表,以将数字从1保留为20(或所需的最大嵌套级别).

You will also need a table called levels to keep numbers from 1 to 20 (or whatever maximum nesting level you want).

要选择所有祖先:

SELECT   pa.*
FROM     places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN     places pa
ON       pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') 
WHERE    p.id = @id_of_place_in_devon

要选择所有子项以及其中的位置计数:

To select all children and counts of places within them:

SELECT  pc.*, COUNT(pp.id)
FROM    places p
JOIN    places pc
ON      pc.parentId = p.id
JOIN    places pp
ON      pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
        AND pp.id NOT IN
        (
        SELECT  parentId
        FROM    places
        )
WHERE   p.id = @id_of_europe
GROUP BY
        pc.id

这篇关于我应该使用哪种层次模型?邻接,嵌套还是枚举?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆