在复杂查询中联接两个表(非统一数据) [英] Joining two tables in a complex query (not uniform data)

查看:102
本文介绍了在复杂查询中联接两个表(非统一数据)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在查询中连接两个表,以便将数据插入到第三个表中(以后将它们用于联接这两个表).我只会在这些表中提及相关列.

I need to connect two tables in a query that I will use to insert data to third table (used in the future to join the two). I will mention only relevant columns in these tables.

PostgreSQL版本9.0.5

PostgreSQL version 9.0.5

表1:data_table

已迁移的数据,大约1万行,相关的列:

migrated data, ca 10k rows, relevant columns:

id(主键)

地址(一个地址,我需要与第二个表匹配的字符串.此地址的长度是可变的.)

address (beginning of an address, string that I need to match with the second table. This address has varying length.)

表2:字典

字典,约900万行,相关列:

dictionary, ca 9 mln rows, relevant columns:

id(主键)

地址(完整地址,我需要与第一个表匹配的字符串,长度也有所不同.)

address (full address, string that I need to match with the first table, varying length as well.)

我到底需要什么

我需要在select语句中正确连接这些表,然后将它们插入第三个表.我需要的是一种成功连接这些表的方法.

I need to correctly connect these tables in a select statement, and then insert these to a third table. All I need is a way to successfully connect these tables.

我要这样做的方法是从data_table中获取每个地址,并将其与以data_table.address开头的字典中的第一个地址(按地址asc排序)连接(不增加记录,因为很多地址)在字典中以每个data_table.address开头).

The way I want to do it is to take each address from data_table, and join it with first address (edit: order by address asc) from dictionary that begins with data_table.address (without multiplying records, as a lot of addresses in dictionary begin with each data_table.address).

此外,两个表中的地址都包含很多不规则空格,因此我们可能需要

Also, addressess in both tables contain a lot of irregular spaces, so we probably need to

replace(address, ' ', '') 

两者均

(欢迎其他任何想法).由于字典有900万行,并且服务器运行缓慢,因此可能还会存在一些性能问题.

on both of them (any alternative ideas welcome). There might also be some performance issues since dictionary has 9 mln rows and the server is rather slow.

我认为结果是以下查询的某种变化:

I see the result as some variation of following query:

select 
data_table.id, dictionary_id
from
data_table, dictionary
where
-conditions-

推荐答案

我们的架构师想出的解决方案是编写一个函数来查找第一个匹配项.

The solution that our architect came up with was writing a function to find the first match.

功能:

CREATE OR REPLACE FUNCTION pick_one_address(text)
  RETURNS text AS
$BODY$
DECLARE
  address_query text;
  toFind text;
  found text;
BEGIN

  toFind := (replace($1, ' ', '') || '%');  
  address_query := 'select al.id from dictionary al where replace(al.adres, '' '', '''') like ''' || toFind ||''' limit 1'; 
  EXECUTE address_query into found;
  RETURN found;

RETURN found_address;
END $BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

由于我确实更改了表名以保护公司的隐私,并且没有提及我用来简化问题的第三张表,因此代码可能看起来很奇怪,但是我想它应该足以理解该机制.

The code might seem strange since I did change table names to protect my company's privacy, and didn't mention third table I used to simplify the question, but I guess it should be enough to understand the mechanism.

感谢您的输入@ ErwinBrandstetter,@ CraigRinger

Thanks for your input @ErwinBrandstetter, @CraigRinger

这篇关于在复杂查询中联接两个表(非统一数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆