在复杂查询中联接两个表(非统一数据) [英] Joining two tables in a complex query (not uniform data)
问题描述
我需要在查询中连接两个表,以便将数据插入到第三个表中(以后将它们用于联接这两个表).我只会在这些表中提及相关列.
I need to connect two tables in a query that I will use to insert data to third table (used in the future to join the two). I will mention only relevant columns in these tables.
PostgreSQL版本9.0.5
PostgreSQL version 9.0.5
表1:data_table
已迁移的数据,大约1万行,相关的列:
migrated data, ca 10k rows, relevant columns:
id(主键)
地址(一个地址,我需要与第二个表匹配的字符串.此地址的长度是可变的.)
address (beginning of an address, string that I need to match with the second table. This address has varying length.)
表2:字典
字典,约900万行,相关列:
dictionary, ca 9 mln rows, relevant columns:
id(主键)
地址(完整地址,我需要与第一个表匹配的字符串,长度也有所不同.)
address (full address, string that I need to match with the first table, varying length as well.)
我到底需要什么
我需要在select语句中正确连接这些表,然后将它们插入第三个表.我需要的是一种成功连接这些表的方法.
I need to correctly connect these tables in a select statement, and then insert these to a third table. All I need is a way to successfully connect these tables.
我要这样做的方法是从data_table中获取每个地址,并将其与以data_table.address开头的字典中的第一个地址(按地址asc排序)连接(不增加记录,因为很多地址)在字典中以每个data_table.address开头).
The way I want to do it is to take each address from data_table, and join it with first address (edit: order by address asc) from dictionary that begins with data_table.address (without multiplying records, as a lot of addresses in dictionary begin with each data_table.address).
此外,两个表中的地址都包含很多不规则空格,因此我们可能需要
Also, addressess in both tables contain a lot of irregular spaces, so we probably need to
replace(address, ' ', '')
两者均
(欢迎其他任何想法).由于字典有900万行,并且服务器运行缓慢,因此可能还会存在一些性能问题.
on both of them (any alternative ideas welcome). There might also be some performance issues since dictionary has 9 mln rows and the server is rather slow.
我认为结果是以下查询的某种变化:
I see the result as some variation of following query:
select
data_table.id, dictionary_id
from
data_table, dictionary
where
-conditions-
推荐答案
我们的架构师想出的解决方案是编写一个函数来查找第一个匹配项.
The solution that our architect came up with was writing a function to find the first match.
功能:
CREATE OR REPLACE FUNCTION pick_one_address(text)
RETURNS text AS
$BODY$
DECLARE
address_query text;
toFind text;
found text;
BEGIN
toFind := (replace($1, ' ', '') || '%');
address_query := 'select al.id from dictionary al where replace(al.adres, '' '', '''') like ''' || toFind ||''' limit 1';
EXECUTE address_query into found;
RETURN found;
RETURN found_address;
END $BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
由于我确实更改了表名以保护公司的隐私,并且没有提及我用来简化问题的第三张表,因此代码可能看起来很奇怪,但是我想它应该足以理解该机制.
The code might seem strange since I did change table names to protect my company's privacy, and didn't mention third table I used to simplify the question, but I guess it should be enough to understand the mechanism.
感谢您的输入@ ErwinBrandstetter,@ CraigRinger
Thanks for your input @ErwinBrandstetter, @CraigRinger
这篇关于在复杂查询中联接两个表(非统一数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!