规范化使跨多个表的联接变得困难 [英] Normalization makes joins accross multiple tables difficult

查看:103
本文介绍了规范化使跨多个表的联接变得困难的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个商店表,其中包含商店名称和地址.经过一番讨论,我们现在将表格标准化,将地址放在单独的表格中.这样做有两个原因:

I had a table for stores containing store name and address. After some discussion, we are now normalizing the the table, putting address in separate tables. This is done for two reasons:

  1. 通过位置/地址提高商店的搜索速度
  2. 在导入商店时,使用 Levenshtein算法,增加了检查拼写错误的街道名称的执行时间. li>
  1. Increase search speed for stores by location / address
  2. Increase execution time for checking misspelled street names using the Levenshtein algorithm when importing stores.

新结构如下(忽略拼写错误):

The new structure looks like this (ignore typos):

country;
+--------------------+--------------+------+-----+---------+-------+  
| Field              | Type         | Null | Key | Default | Extra |  
+--------------------+--------------+------+-----+---------+-------+  
| id                 | varchar(2)   | NO   | PRI | NULL    |       |  
| name               | varchar(45)  | NO   |     | NULL    |       |  
| prefix             | varchar(5)   | NO   |     | NULL    |       |  
+--------------------+--------------+------+-----+---------+-------+  

city;
+--------------------+--------------+------+-----+---------+-------+  
| Field              | Type         | Null | Key | Default | Extra |  
+--------------------+--------------+------+-----+---------+-------+  
| id                 | int(11)      | NO   | PRI | NULL    |       |  
| city               | varchar(50)  | NO   |     | NULL    |       |  
+--------------------+--------------+------+-----+---------+-------+  

street;
+--------------------+--------------+------+-----+---------+-------+  
| Field              | Type         | Null | Key | Default | Extra |  
+--------------------+--------------+------+-----+---------+-------+  
| id                 | int(11)      | NO   | PRI | NULL    |       |  
| street             | varchar(50)  | YES  |     | NULL    |       |  
| fk_cityID          | int(11)      | NO   |     | NULL    |       |  
+--------------------+--------------+------+-----+---------+-------+  

address;
+--------------------+--------------+------+-----+---------+-------+  
| Field              | Type         | Null | Key | Default | Extra |  
+--------------------+--------------+------+-----+---------+-------+  
| id                 | int(11)      | NO   | PRI | NULL    |       |  
| streetNum          | varchar(10)  | NO   |     | NULL    |       |  
| street2            | varchar(50)  | NO   |     | NULL    |       |  
| zipcode            | varchar(10)  | NO   |     | NULL    |       |  
| fk_streetID        | int(11)      | NO   |     | NULL    |       |  
| fk_countryID       | int(11)      | NO   |     | NULL    |       |  
+--------------------+--------------+------+-----+---------+-------+  
*street2 is for secondary reference or secondary address in e.g. the US.

store;
+--------------------+--------------+------+-----+---------+-------+  
| Field              | Type         | Null | Key | Default | Extra |  
+--------------------+--------------+------+-----+---------+-------+  
| id                 | int(11)      | NO   | PRI | NULL    |       |  
| name               | varchar(50)  | YES  |     | NULL    |       |
| street             | varchar(50)  | YES  |     | NULL    |       |    
| fk_addressID       | int(11)      | NO   |     | NULL    |       |  
+--------------------+--------------+------+-----+---------+-------+  
*I've left out address columns in this table to shorten code

新表已填充了正确的数据,剩下的唯一事情就是在store表中添加外键address.id.

The new tables have been populated with correct data and the only thing remaining is to add foreign key address.id in store table.

以下代码正确列出了所有街道名称:

The following code lists all street names correctly:

select a.id, b.street, a.street2, a.zipcode, c.city, a.fk_countryID
from address a
left join street b on a.fk_streetID = b.id
left join city c on b.fk_cityID = c.id

  1. 如何更新store表中的fk_addressID?
  2. 如何列出所有具有正确地址的商店?
  3. 这种不好的归一化考虑了上面给出的原因吗?
  1. How can I update fk_addressID in store table?
  2. How can I list all stores with correct address?
  3. Is this bad normalization considering the reasons given above?

更新

以下代码似乎列出了所有具有正确地址的商店-但是速度有点慢(我大约有2000家商店):

It seems like the following code lists all stores with correct address - however it is a bit slow (I have about 2000 stores):

select a.id, a.name, b.id, c.street
from sl_store a, sl_address b, sl_street c
where b.fk_streetID = c.id
and a.street1 = c.street
group by a.name
order by a.id

推荐答案

我不会讲拼写错误.由于要导入数据,因此可以在临时表中更好地处理拼写错误.

I'm not going to speak to misspellings. Since you're importing the data, misspellings are better handled in a staging table.

让我们看看这个稍微简化的版本.

Let's look at this slightly simplified version.

create table stores
(
  store_name varchar(50) primary key,
  street_num varchar(10) not null,
  street_name varchar(50) not null,
  city varchar(50) not null,
  state_code char(2) not null,
  zip_code char(5) not null,
  iso_country_code char(2) not null,
  -- Depending on what kind of store you're talking about, you *could* have
  -- two of them at the same address. If so, drop this constraint.
  unique (street_num, street_name, city, state_code, zip_code, iso_country_code)
);  

insert into stores values 
('Dairy Queen #212',  '232', 'N 1st St SE',   'Castroville',  'CA', '95012', 'US'),
('Dairy Queen #213',  '177', 'Broadway Ave',  'Hartsdale',    'NY', '10530', 'US'),
('Dairy Queen #214', '7640', 'Vermillion St', 'Seneca Falls', 'NY', '13148', 'US'),
('Dairy Queen #215', '1014', 'Handy Rd',      'Olive Hill',   'KY', '41164', 'US'),
('Dairy Mart #101',   '145', 'N 1st St SE',   'Castroville',  'CA', '95012', 'US'),
('Dairy Mart #121',  '1042', 'Handy Rd',      'Olive Hill',   'KY', '41164', 'US');

尽管很多人坚信邮政编码是美国的城市和州,但事实并非如此.邮政编码与承运人驾驶路线的方式有关,而与地理位置无关.一些城市跨越州际边界.单个邮政编码路线可以跨越状态线.即使维基百科也知道这一点,尽管他们的示例可能已过时. (送货路线不断变化.)

Although a lot of people firmly believe that ZIP code determines city and state in the US, that's not the case. ZIP codes have to do with how carriers drive their routes, not with geography. Some cities straddle the borders between states; single ZIP code routes can cross state lines. Even Wikipedia knows this, although their examples might be out of date. (Delivery routes change constantly.)

所以我们有一个包含两个候选键的表

So we have a table that has two candidate keys,

  • {store_name}和
  • {街道编号,街道名称,城市,州代码,邮政编码,iso_country_code}

它没有非关键属性.我认为这张桌子是5NF.你觉得呢?

It has no non-key attributes. I think this table is in 5NF. What do you think?

如果我想提高街道名称的数据完整性,则可以从这样的内容开始.

If I wanted to increase the data integrity for street names, I might start with something like this.

create table street_names
(
  street_name varchar(50) not null,
  city varchar(50) not null,
  state_code char(2) not null,
  iso_country_code char(2) not null,
  primary key (street_name, city, state_code, iso_country_code)
);  

insert into street_names
select distinct street_name, city, state_code, iso_country_code
from stores;

alter table stores
add constraint streets_from_street_names
foreign key             (street_name, city, state_code, iso_country_code)
references street_names (street_name, city, state_code, iso_country_code);
-- I don't cascade updates or deletes, because in my experience
-- with addresses, that's almost never the right thing to do when a 
-- street name changes.

您可以(可能应该)对城市名称,州名称(州代码)和国家/地区名称重复此过程.

You could (and probably should) repeat this process for city names, state names (state codes), and country names.

您的方法存在一些问题

您显然可以输入在美国的街道的街道编号,以及克罗地亚的国家/地区编号. (可以说,城市的全名"是您可能要存储的事实,以提高数据完整性.街道的全名"也可能是这样.)

You can apparently enter a street id number for a street that's in the US, along with the country id for Croatia. (The "full name" of a city, so to speak, is the kind of fact you probably want to store in order to increase data integrity. That's probably also true of the "full name" of a street.)

为每个数据位使用ID号会大大增加所需的联接数.使用ID号与规范化无关.使用ID号时,对自然键没有相应的唯一约束(这是一个非常普遍的错误),从而允许重复数据.

Using id numbers for every bit of data greatly increases the number of joins required. Using id numbers doesn't have anything to do with normalization. Using id numbers without corresponding unique constraints on the natural keys--an utterly commonplace mistake--allows duplicate data.

这篇关于规范化使跨多个表的联接变得困难的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆