如何使用T-SQL中两个不同表的值调用Levenshtien函数 [英] How to call Levenshtien Function using the values from two different tables in T-SQL

查看:126
本文介绍了如何使用T-SQL中两个不同表的值调用Levenshtien函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到两个不同表TableA和TableB的列之间的Levenshtien距离.基本上,我需要将TableA的ColumnA与TableB中的ColumnB的所有元素进行匹配,并找到Levenshtien距离

I am trying to find the Levenshtien distance between the columns of two different tables TableA and TableB. Basically I need to match ColumnA of TableA with all the elements of ColumnB in TableB and find the Levenshtien Distance

我创建了如下的Levenshtien函数

I have created a Levenshtien Function as follows

CREATE FUNCTION [Levenshtein]
     (@value1 [NVARCHAR](MAX), 
      @value2 [NVARCHAR](MAX))
RETURNS [INT]
AS 
EXTERNAL NAME [FastenshteinAssembly].[Fastenshtein.Levenshtein].[Distance]
GO

这基本上是指我在计算机上拥有的Levenshtien dll.我尝试为此操作创建存储过程,但是不确定这是否是一种优化的方法.

This is basically calling a Levenshtien dll I have on my machine. I tried creating a stored procedure for this operation but I am unsure whether that's an optimized approach or not.

表B包含数百万个CompanyName,而TableA也包含数千个CompanyName,因此此操作本质上是(n * m)个操作.

Table B contains millions of CompanyNames and TableA also consists of thousand of CompanyNames so this operation would be essentially a (n*m) operation.

实现此目标的优化方法是什么?

Whats the optimized approach of achieving this.

谢谢

推荐答案

目前尚无优化的方法.

您可以采取一些技巧来简化处理过程.例如,您可以使用n-gram在每一侧创建查找表,并且仅比较ngram接近的名称.或者,您可以出于相同的目的使用soundex()或前三个字符.

There may be some hacks that you can do to simplify the processing. For instance, you could create lookup tables on each side using n-grams and only compare names whose ngrams are close. Or, you could use soundex() for the same purpose -- or the first three characters.

但是,如果您需要满足所有可能,那么这是SQL Server中昂贵的n * m操作.

However, if you need to match to all possibilities, then this is an expensive n*m operation in SQL Server.

这篇关于如何使用T-SQL中两个不同表的值调用Levenshtien函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆