在DB中存储距离矩阵 [英] Storing a distance matrix in DB
问题描述
我需要在我的网页上显示一个城市的所有附近位置的距离矩阵。
I need do display a distance matrix on my web-page for all the nearby locations for a city.
我想从网页获取所有这些数据,服务并提前保存在我的DB中。
我试图找出最好的关系数据库设计来保存这样的数据。
I would like to fetch all this data from web-service and save in my DB in advance. I am trying to figure out the best relational DB design to save such a data.
我想避免冗余数据, 。
I want to avoid redundant data and also a design which gives optimal performance.
我知道关系DB不是最好的选择,但这是我现在不能帮助的。
I know relation DB is not the best option for this but that is something I can not help at this point.
问题:那么什么是最好的DB模式设计来存储这样的信息。我需要查询DB只提供一个城市,我将显示一个矩阵5或10最近的城市。
Question: So what is the best DB schema design to store such info. I would need to query DB providing just one city and I would have to display a matrix of 5 or 10 closest cities.
旅行时间并不重要,我主要关注距离。
Travel time is not that important, I am concerned about distance mainly.
推荐答案
出于性能考虑,假设您使用的是InnoDB,我可能需要对数据进行反正规化,如下所示:
For the sake of performance, and assuming you are using InnoDB, I'd probably denormalize the data a bit, like this:
CREATE TABLE CITY (
CITY_ID INT PRIMARY KEY
);
CREATE TABLE CITY_DISTANCE (
CITY1_ID INT,
CITY2_ID INT,
DISTANCE NUMERIC NOT NULL,
PRIMARY KEY (CITY1_ID, DISTANCE, CITY2_ID),
FOREIGN KEY (CITY1_ID) REFERENCES CITY (CITY_ID),
FOREIGN KEY (CITY2_ID) REFERENCES CITY (CITY_ID)
);
每对城市在CITY_DISTANCE中有两行,其中包含相同的DISTANCE(每个方向一个)。这显然可以使它非常大,并可能导致数据不一致(数据库将不会保护自己与不匹配的DISTANCE值在相同的城市之间),DISTANCE不在逻辑上属于PK,但忍受我...
Each pair of cities has 2 rows in CITY_DISTANCE containing the same DISTANCE (one for each direction). This could obviously make it very big and could lead to data inconsistencies (the database will not defend itself from non-matching DISTANCE values between same cities), and the DISTANCE doesn't logically belong to the PK, but bear with me...
InnoDB表格clustered ,这意味着通过以这种特定方式声明PK,我们将整个表放在特别适合于这样的查询的B-Tree中:
InnoDB tables are clustered, which means that by declaring the PK in this particular way we put the whole table in a B-Tree that is particularly suited for a query like this:
SELECT CITY2_ID, DISTANCE
FROM CITY_DISTANCE
WHERE CITY1_ID = 1
ORDER BY DISTANCE
LIMIT 5
此查询会返回由 1
标识的城市最近的5个城市,通过在上述B树上的简单范围扫描来满足:
This query returns the closest 5 cities to the city identified by 1
, and can be satisfied by a simple range scan on the B-Tree mentioned above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE CITY_DISTANCE ref PRIMARY PRIMARY 4 const 6 "Using where; Using index"
BTW,InnoDB将自动创建一个索引(在CITY2_ID上) FK,它还将包括CITY1_ID和DISTANCE,因为集群表中的辅助索引必须覆盖PK。您可以利用它来避免重复的DISTANCE(在{CITY2_ID,DISTANCE,CITY1_ID}上显式创建索引并让FK重用它) CHECK(CITY1_ID< CITY2_ID)),但是MySQL查询优化器可能不够聪明,无法处理这种结构所需的查询。
BTW, the InnoDB will automatically create one more index (on CITY2_ID) because of the second FK, which will also include the CITY1_ID and DISTANCE because secondary indexes in clustered tables must cover PK. You might be able to exploit that to avoid duplicated DISTANCEs (explicitly create index on {CITY2_ID, DISTANCE, CITY1_ID} and let FK reuse it, and CHECK (CITY1_ID < CITY2_ID)), but MySQL query optimizer is probably not smart enough to deal with the query that would be required on such a structure.
这篇关于在DB中存储距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!