在DB中存储距离矩阵 [英] Storing a distance matrix in DB

查看:191
本文介绍了在DB中存储距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在我的网页上显示一个城市的所有附近位置的距离矩阵。

I need do display a distance matrix on my web-page for all the nearby locations for a city.

我想从网页获取所有这些数据,服务并提前保存在我的DB中。
我试图找出最好的关系数据库设计来保存这样的数据。

I would like to fetch all this data from web-service and save in my DB in advance. I am trying to figure out the best relational DB design to save such a data.

我想避免冗余数据, 。

I want to avoid redundant data and also a design which gives optimal performance.

我知道关系DB不是最好的选择,但这是我现在不能帮助的。

I know relation DB is not the best option for this but that is something I can not help at this point.

问题:那么什么是最好的DB模式设计来存储这样的信息。我需要查询DB只提供一个城市,我将显示一个矩阵5或10最近的城市。

Question: So what is the best DB schema design to store such info. I would need to query DB providing just one city and I would have to display a matrix of 5 or 10 closest cities.

旅行时间并不重要,我主要关注距离。

Travel time is not that important, I am concerned about distance mainly.

推荐答案

出于性能考虑,假设您使用的是InnoDB,我可能需要对数据进行反正规化,如下所示:

For the sake of performance, and assuming you are using InnoDB, I'd probably denormalize the data a bit, like this:

CREATE TABLE CITY (
    CITY_ID INT PRIMARY KEY
);

CREATE TABLE CITY_DISTANCE (
    CITY1_ID INT,
    CITY2_ID INT,
    DISTANCE NUMERIC NOT NULL,
    PRIMARY KEY (CITY1_ID, DISTANCE, CITY2_ID),
    FOREIGN KEY (CITY1_ID) REFERENCES CITY (CITY_ID),
    FOREIGN KEY (CITY2_ID) REFERENCES CITY (CITY_ID)
);

每对城市在CITY_DISTANCE中有两行,其中包含相同的DISTANCE(每个方向一个)。这显然可以使它非常大,并可能导致数据不一致(数据库将不会保护自己与不匹配的DISTANCE值在相同的城市之间),DISTANCE不在逻辑上属于PK,但忍受我...

Each pair of cities has 2 rows in CITY_DISTANCE containing the same DISTANCE (one for each direction). This could obviously make it very big and could lead to data inconsistencies (the database will not defend itself from non-matching DISTANCE values between same cities), and the DISTANCE doesn't logically belong to the PK, but bear with me...

InnoDB表格clustered ,这意味着通过以这种特定方式声明PK,我们将整个表放在特别适合于这样的查询的B-Tree中:

InnoDB tables are clustered, which means that by declaring the PK in this particular way we put the whole table in a B-Tree that is particularly suited for a query like this:

SELECT CITY2_ID, DISTANCE
FROM CITY_DISTANCE
WHERE CITY1_ID = 1
ORDER BY DISTANCE
LIMIT 5

此查询会返回由 1 标识的城市最近的5个城市,通过在上述B树上的简单范围扫描来满足:

This query returns the closest 5 cities to the city identified by 1, and can be satisfied by a simple range scan on the B-Tree mentioned above:

id  select_type table           type    possible_keys   key     key_len ref     rows    Extra
1   SIMPLE      CITY_DISTANCE   ref     PRIMARY         PRIMARY 4       const   6       "Using where; Using index"

BTW,InnoDB将自动创建一个索引(在CITY2_ID上) FK,它还将包括CITY1_ID和DISTANCE,因为集群表中的辅助索引必须覆盖PK。您可以利用它来避免重复的DISTANCE(在{CITY2_ID,DISTANCE,CITY1_ID}上显式创建索引并让FK重用它) CHECK(CITY1_ID< CITY2_ID)),但是MySQL查询优化器可能不够聪明,无法处理这种结构所需的查询。

BTW, the InnoDB will automatically create one more index (on CITY2_ID) because of the second FK, which will also include the CITY1_ID and DISTANCE because secondary indexes in clustered tables must cover PK. You might be able to exploit that to avoid duplicated DISTANCEs (explicitly create index on {CITY2_ID, DISTANCE, CITY1_ID} and let FK reuse it, and CHECK (CITY1_ID < CITY2_ID)), but MySQL query optimizer is probably not smart enough to deal with the query that would be required on such a structure.

这篇关于在DB中存储距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆