我应该使用哪种层次模型?邻接,嵌套还是枚举? [英] Which Hierarchical model should I use? Adjacency, Nested, or Enumerated?
问题描述
我有一张表格,其中包含世界上所有地理位置的位置及其关系.
I have a table which contains a location of all geographical locations in the world and their relationships.
这里是显示层次结构的示例.您会看到数据实际上是全部存储为三个
Here is a example that shows the hierarchy. You will see that the data is actually stored as all three
- 枚举路径
- 邻接表
- 嵌套集
数据显然也不会改变.以下是英格兰布赖顿(Brighton)地区的直接祖先的例子,该地区的悲惨经历是13911.
The data obviously never changes either. Below is an example of direct ancestors of the location Brighton in England which has a woeid of 13911.
表:geoplanet_places
(具有560万行)
大图: http://tinyurl.com/68q4ndx
Table: geoplanet_places
(Has 5.6million rows)
Large Image: http://tinyurl.com/68q4ndx
然后我有了另一个名为entities
的表.该表存储了我想映射到某个地理位置的项目.我存储了一些基本信息,但最重要的是,我存储了woeid
,它是来自geoplanet_places
的外键.
I then have another table called entities
. This table stores my items which I would like to map to a geographical location. I store some basic information but most important I store the woeid
which is a foreign key from geoplanet_places
.
最终,entities
表将包含数千个实体.我想要一种能够返回包含实体的所有节点的完整树的方法.
Eventually the entities
table will contain several thousand entities. And I would like a way to be able to return a full tree of all of the nodes which contain entities.
我计划创建一些内容,以方便根据实体的地理位置过滤和搜索实体,并能够发现在该特定节点上可以找到多少个实体.
I plan on creating something to facilitate the filtering and searching of entities based on their geographical location and be able to discover how many entities can be found on that particular node.
因此,如果我的entities
表中只有一个实体,则可能会有类似的内容
So if I only have one entity in my entities
table, I might have something like this
`地球(1)
`Earth (1)
英国(1)
英国(1)
东萨塞克斯郡(1)
布莱顿霍夫(1)
布莱顿(1)`
然后让我们说我在德文郡有另一个实体,那么它将显示如下内容:
Lets then say that I have another entity which is located in Devon, then it would show something like:
地球(2)
Earth (2)
英国国王金(2)
英国(2)
德文(1)
东萨塞克斯郡(1) ...等等
East Sussex (1) ... etc
表示每个地理位置内部"有多少个实体的(计数)不需要存在.我可以忍受每小时生成我的对象并对其进行缓存.
The (Counts) which will say how many entities are "inside" of each geographical location do not need to be live. I can live with generating my object every hour and caching it.
目标是能够创建一个界面,以开始仅显示拥有实体的国家/地区.
The aim, is to be able to create an interface which might start out showing only the Countries which have entities..
就像
Argentina (1021)
,Chile (291)
,...
,United States (32,103)
,United Kingdom (12,338)
然后,用户将单击某个位置(例如United Kindom),然后将获得所有直接子节点,这些子节点是UK的后代,并且在其中具有实体.
Then the user will click on a location, such as United Kindom, and will then be given all of the immediate child nodes which are descendants of United Kingdom AND have an entity in them.
如果联合王国有32个县,但是最终在您向下钻取时只有23个县中存储有实体,那么我不想显示其他9个县.这只是位置.
If there are 32 Counties in United Kindgdom, but only 23 of them eventually when you drill down have entities stored in them, then I don't want to display the other 9. It is only locations.
此站点恰当地演示了我希望实现的功能: http://www.homeaway.com/vacation-rentals/europe/r5
This site aptly demonstrates the functionality that I wish to achieve: http://www.homeaway.com/vacation-rentals/europe/r5
您如何建议我管理这样的数据结构?
How do you recommend that I manage such a data structure?
我正在使用的东西.
- PHP
- MySQL
- Solr
我计划尽可能快地进行钻取.我想创建一个AJAX界面,该界面对于搜索来说似乎是不可能的.
I plan on having the Drill downs be as rapid as possible. I want to create an AJAX interface that will be seemless for searching.
我也想知道您建议在哪些列上建立索引.
I would also be interested to know which columns you would recommend indexing on.
推荐答案
通常,层次结构中存在三种导致问题的查询:
Typically, there are three kinds of queries in the hierarchies which cause troubles:
- 归还所有祖先
- 返回所有后代
- 归还所有孩子(直系后代).
这是一张小表,显示了MySQL
中不同方法的性能:
Here's a little table which shows the performance of different methods in MySQL
:
Ancestors Descendants Children Maintainability InnoDB
Adjacency list Good Decent Excellent Easy Yes
Nested sets (classic) Poor Excellent Poor/Excellent Very hard Yes
Nested sets (spatial) Excellent Very good Poor/Excellent Very hard No
Materialized path Excellent Very good Poor/Excellent Hard Yes
在children
中,poor/excellent
表示答案取决于您是否将方法与邻接表(即i)混合使用. e.在每个记录中存储parentID
.
In children
, poor/excellent
means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing the parentID
in each record.
对于您的任务,您需要所有三个查询:
For your task, you need all three queries:
- 所有祖先展示地球/英国/德文郡的事物
- 所有要出示欧洲目的地"的儿童(以下各项)
- 所有后代均显示欧洲目的地"(计数)
我会走物化之路,因为这种等级制很少改变(仅在战争,起义等情况下).
I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).
创建一个名为path
的varchar列,对其进行索引并用以下值填充它:
Create a varchar column called path
, index it and fill it with the value like this:
1:234:6345:45454:
其中的数字是正确父母的主键,顺序正确(欧洲为1
,英国为234
等)
where the numbers are primary keys of the appropriate parents, in correct order (1
for Europe, 234
for UK etc.)
您还需要一个名为levels
的表,以将数字从1
保留为20
(或所需的最大嵌套级别).
You will also need a table called levels
to keep numbers from 1
to 20
(or whatever maximum nesting level you want).
要选择所有祖先:
SELECT pa.*
FROM places p
JOIN levels l
ON SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN places pa
ON pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':')
WHERE p.id = @id_of_place_in_devon
要选择所有子项以及其中的位置计数:
To select all children and counts of places within them:
SELECT pc.*, COUNT(pp.id)
FROM places p
JOIN places pc
ON pc.parentId = p.id
JOIN places pp
ON pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
AND pp.id NOT IN
(
SELECT parentId
FROM places
)
WHERE p.id = @id_of_europe
GROUP BY
pc.id
这篇关于我应该使用哪种层次模型?邻接,嵌套还是枚举?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!