在postgresql中搜索跨字段重复项并带回匹配的对 [英] search for cross-field duplicates in postgresql and bring back matched pairs

查看:129
本文介绍了在postgresql中搜索跨字段重复项并带回匹配的对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张联系表。该表包含mobile_phone列和home_phone列。我想获取所有重复的联系人对,其中一对是两个共享一个电话号码的联系人。

I have a table of contacts. The table contains a mobile_phone column as well as a home_phone column. I'd like to fetch all pairs of duplicate contacts where a pair is two contacts sharing a phone number.

请注意,如果联系人A的mobile_phone与联系人B的home_phone匹配,那么这也是重复的。
这是应该匹配的三个联系人的示例。

Note that if contact A's mobile_phone matches contact B's home_phone, this is also a duplicate. Here is an example of three contacts that should match.

contact_id|mobile_phone|home_phone|other columns such as email.......|...
-------------------------------------------------------------------------
111       |9748777777  |1112312312|..................................|...
112       |1112312312  |null      |..................................|...
113       |9748777777  |0001112222|..................................|...

具体来说,我想带回一张表,其中每行包含两个匹配联系人的contact_id。例如,

Specifically, I would like to bring back a table where each row contains the contact_ids of the two matching contacts. For example,

||contact_id_a|contact_id_b||
||-------------------------||
||   145155   |   145999   ||
||   145158   |   145141   ||

在@Erwin的帮助下,这里在此处输入链接说明我能够编写与我要实现的查询接近的查询,从而带回了该列表中所有联系人的contact_id列表与列表中的其他联系人共享电话号码。

With the help of @Erwin here enter link description here I was able to write a query close to what I am trying to achieve brings back a list of contact_ids of all contacts in the list that share a phone number with other contacts in the list.

SELECT c.contact_id
FROM   contacts c
WHERE  EXISTS (
   SELECT FROM contacts x
   WHERE (x.data->>'mobile_phone' is not null and x.data->>'mobile_phone' IN (c.data->>'mobile_phone', c.data->>'home_phone'))
       OR (x.data->>'home_phone' is not null and x.data->>'home_phone'   IN (c.data->>'mobile_phone', c.data->>'home_phone'))
   AND x.contact_id <> c.contact_id  -- except self
   );

输出仅包含如下的contact_id ...

The output only contains contact_ids like this...

||contact_id||
--------------
||  2341514 ||
||  345141  ||

我想将匹配联系人的contact_ids单行显示,如上所示。

I'd like to bring back the contact_ids of matching contacts in a single row as shown above.

推荐答案

一个简单的查询就是使用 ARRAY重叠运算符&&

A simple query would be with the ARRAY overlap operator &&:

SELECT c1.contact_id AS a, c2.contact_id AS b
FROM   contacts c1
JOIN   contacts c2 ON c1.contact_id < c2.contact_id
WHERE  ARRAY [c1.mobile_phone, c1.home_phone] && ARRAY[c2.mobile_phone, c2.home_phone];

条件 c1.contact_id< c2.contact_id 不包括自连接和切换重复项。

The condition c1.contact_id < c2.contact_id excludes self-joins and switched duplicates.

但是,如果许多联系人以某种方式共享相同的号码,这种表示很快就会失控。

But this representation gets out of hand quickly if many contacts share the same number some way.

放在一边: [INNER] JOIN WHERE 的条件完全消耗掉了相同但不超过 <$涉及到c $ c> join_collapse_limit 连接。请参阅:

Aside: conditions of an [INNER] JOIN and WHERE conditions burn down doing exactly the same while no more than join_collapse_limit joins are involved. See:

  • Count on join of big tables with conditions is slow

这篇关于在postgresql中搜索跨字段重复项并带回匹配的对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆