如何在sql中查找几乎类似的记录? [英] How to find almost similar records in sql?
问题描述
这是搜索记录:
A = {
field1: value1,
field2: value2,
...
fieldN: valueN
}
我在数据库中有很多这样的记录。
I have many such records in the database.
其他记录(B)与记录A几乎匹配,即使这些记录中的N-M个字段相等。例如,M = 2:
Other record (B) almost matches record A if even N-M fields in these records are equal. This is the example, M=2:
B = {
field1: OTHER_value1,
field2: OTHER_value2,
field3: value3,
...
fieldN: valueN
}
如果可以是任何字段,不仅是第一个。
If can be any fields, not only the first.
我可以进行非常大的组合sql查询,但是可能有更漂亮的解决方案。
I can make the very big combinatorial sql query, but may be there is more beautiful solution.
P.S .:我的数据库是PostgreSQL。
P.S.: My database is PostgreSQL.
推荐答案
我会使用与
来处理 NULL
值。
您也可以使用Postgres简写形式来简化逻辑。一种方法是:
You can also use Postgres short-hand to simplify the logic. One way is:
where ( (a.field1 is not distinct from b.field1)::int +
(a.field2 is not distinct from b.field2)::int +
. . .
(a.fieldn is not distinct from b.fieldn)::int +
) >= N - M
我认为仅用表示就更容易了M
。因此,仅查看不同的字段:
I think this is easier to express only in terms of M
. So, only look at the fields that are different:
where ( (a.field1 is distinct from b.field1)::int +
(a.field2 is distinct from b.field2)::int +
. . .
(a.fieldn is distinct from b.fieldn)::int +
) <= M
使用数据进行此操作需要交叉连接
这非常昂贵。
Doing this with your data requires a cross join
which is quite expensive.
这篇关于如何在sql中查找几乎类似的记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!