如何使用T-SQL中两个不同表的值调用Levenshtien函数 [英] How to call Levenshtien Function using the values from two different tables in T-SQL
问题描述
我试图找到两个不同表TableA和TableB的列之间的Levenshtien距离.基本上,我需要将TableA的ColumnA与TableB中的ColumnB的所有元素进行匹配,并找到Levenshtien距离
I am trying to find the Levenshtien distance between the columns of two different tables TableA and TableB. Basically I need to match ColumnA of TableA with all the elements of ColumnB in TableB and find the Levenshtien Distance
我创建了如下的Levenshtien函数
I have created a Levenshtien Function as follows
CREATE FUNCTION [Levenshtein]
(@value1 [NVARCHAR](MAX),
@value2 [NVARCHAR](MAX))
RETURNS [INT]
AS
EXTERNAL NAME [FastenshteinAssembly].[Fastenshtein.Levenshtein].[Distance]
GO
这基本上是指我在计算机上拥有的Levenshtien dll.我尝试为此操作创建存储过程,但是不确定这是否是一种优化的方法.
This is basically calling a Levenshtien dll I have on my machine. I tried creating a stored procedure for this operation but I am unsure whether that's an optimized approach or not.
表B包含数百万个CompanyName,而TableA也包含数千个CompanyName,因此此操作本质上是(n * m)个操作.
Table B contains millions of CompanyNames and TableA also consists of thousand of CompanyNames so this operation would be essentially a (n*m) operation.
实现此目标的优化方法是什么?
Whats the optimized approach of achieving this.
谢谢
推荐答案
目前尚无优化的方法.
您可以采取一些技巧来简化处理过程.例如,您可以使用n-gram在每一侧创建查找表,并且仅比较ngram接近的名称.或者,您可以出于相同的目的使用soundex()
或前三个字符.
There may be some hacks that you can do to simplify the processing. For instance, you could create lookup tables on each side using n-grams and only compare names whose ngrams are close. Or, you could use soundex()
for the same purpose -- or the first three characters.
但是,如果您需要满足所有可能,那么这是SQL Server中昂贵的n * m操作.
However, if you need to match to all possibilities, then this is an expensive n*m operation in SQL Server.
这篇关于如何使用T-SQL中两个不同表的值调用Levenshtien函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!