T-SQL获取2个字符串的字符匹配百分比 [英] T-SQL Get percentage of character match of 2 strings
本文介绍了T-SQL获取2个字符串的字符匹配百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我有一组2个字:
Alexander和Alecsander OR Alexander和Alegzander
亚历山大和亚历山大,或任何其他组合。一般来说,我们在说一个单词或一组单词的人为错误。
我想要实现的是获得匹配的字符的百分比这两个字符串。
这是我到目前为止:
DECLARE @ table1 TABLE
(
nr INT
,ch CHAR
)
DECLARE @ table2 TABLE
(
nr INT
,ch CHAR
)
INSERT INTO @ table1
SELECT nr,ch FROM [dbo]。[SplitStringIntoCharacters]('WORD w' ) - >返回一个字符表(包含空格)
INSERT INTO @ table2
SELECT nr,ch FROM [dbo]。[SplitStringIntoCharacters]('WORD 5')
DECLARE @resultsTable TABLE
(
ch1 CHAR
,ch2 CHAR
)
INSERT INTO @resultsTable
SELECT DISTINCt t1.ch ch1,t2.ch ch2 FROM @ table1 t1
FULL JOIN @ table2 t2 ON t1.ch = t2.ch - >返回匹配和缺少匹配
SELECT * FROM @resultsTable
DECLARE @nrOfMathches INT,@nrOfMismatches INT,@nrOfRowsInResultsTable INT
SELECT @nrOfMathches = COUNT(1)FROM @resultsTable WHERE ch1 IS NOT NULL AND ch2 IS NOT NULL
SELECT @nrOfMismatches = COUNT(1)FROM @resultsTable WHERE ch1 IS NULL OR ch2 IS NULL
SELECT @nrOfRowsInResultsTable = COUNT 1)FROM @resultsTable
SELECT @nrOfMathches * 100 / @nrOfRowsInResultsTable
$ b b
SELECT * FROM @resultsTable
将返回以下内容:
ch1 ch2
NULL 5
[blank] [blank]
DD
OO
RR
WW
解决方案好的,这里是我的解决方案:
SELECT [dbo]。[GetPercentageOfTwoStringMatching]('valentin123456','valnetin123456')
返回86%
CREATE FUNCTION [dbo]。[GetPercentageOfTwoStringMatching]
b $ b @ string1 NVARCHAR(100)
,@ string2 NVARCHAR(100)
)
RETURNS INT
AS
BEGIN
DECLARE @levenShteinNumber INT
DECLARE @ string1Length INT = LEN(@ string1)
,@ string2Length INT = LEN(@ string2)
DECLARE @maxLengthNumber INT = CASE WHEN @ string1Length> @ string2Length THEN @ string1Length ELSE @ string2Length END
SELECT @levenShteinNumber = [dbo]。[LEVENSHTEIN](@ string1,@ string2)
DECLARE @percentageOfBadCharacters INT = @leven ShteinNumber * 100 / @maxLengthNumber
DECLARE @percentageOfGoodCharacters INT = 100 - @percentageOfBadCharacters
- 返回函数的结果
RETURN @percentageOfGoodCharacters
END
- ======================== =====================
- 创建日期:2011.12.14
- 说明:http://blog.sendreallybigfiles.com /2009/06/improved-t-sql-levenshtein-distance.html
- ============================ =================
CREATE FUNCTION [dbo]。[LEVENSHTEIN](@ left VARCHAR(100),
@right VARCHAR 100))
返回INT
AS
BEGIN
DECLARE @difference INT,
@lenRight INT,
@lenLeft INT,
@leftIndex INT,
@rightIndex INT,
@left_char CHAR(1),
@right_char CHAR(1),
@compareLength INT
SET @lenLeft = LEN(@left)
SET @lenRight = LEN(@right)
SET @difference = 0
如果@lenLeft = 0
BEGIN
SET @difference = @lenRight
GOTO done
END
如果@lenRight = 0
BEGIN
SET @difference = @lenLeft
GOTO done
END
GOTO比较
比较:
IF(@lenLeft> = @ lenRight)
SET @compareLength = @lenLeft
ELSE
SET @compareLength = @lenRight
SET @rightIndex = 1
SET @leftIndex = 1
WHILE @leftIndex< = @compareLength
BEGIN
SET @left_char = substring(@left,@leftIndex,1)
SET @right_char = substring ,@rightIndex,1)
如果@left_char<> @right_char
BEGIN - 插入插入是否会使它们重新对齐?
IF(@left_char = substring(@right,@rightIndex + 1,1))
SET @rightIndex = @rightIndex + 1
- 删除操作会使它们重新对齐吗?
ELSE IF(substring(@left,@leftIndex + 1,1)= @right_char)
SET @leftIndex = @leftIndex + 1
SET @difference = @difference + 1
END
SET @leftIndex = @leftIndex + 1
SET @rightIndex = @rightIndex + 1
END
GOTO done
DONE:
RETURN @difference
END
Let's say I have a set of 2 words:
Alexander and Alecsander OR Alexander and Alegzander
Alexander and Aleaxnder, or any other combination. In general we are talking about human error in typing of a word or a set of words.
What I want to achieve is to get the percentage of matching of the characters of the 2 strings.
Here is what I have so far:
DECLARE @table1 TABLE ( nr INT , ch CHAR ) DECLARE @table2 TABLE ( nr INT , ch CHAR ) INSERT INTO @table1 SELECT nr,ch FROM [dbo].[SplitStringIntoCharacters] ('WORD w') --> return a table of characters(spaces included) INSERT INTO @table2 SELECT nr,ch FROM [dbo].[SplitStringIntoCharacters] ('WORD 5') DECLARE @resultsTable TABLE ( ch1 CHAR , ch2 CHAR ) INSERT INTO @resultsTable SELECT DISTINCt t1.ch ch1, t2.ch ch2 FROM @table1 t1 FULL JOIN @table2 t2 ON t1.ch = t2.ch --> returns both matches and missmatches SELECT * FROM @resultsTable DECLARE @nrOfMathches INT, @nrOfMismatches INT, @nrOfRowsInResultsTable INT SELECT @nrOfMathches = COUNT(1) FROM @resultsTable WHERE ch1 IS NOT NULL AND ch2 IS NOT NULL SELECT @nrOfMismatches = COUNT(1) FROM @resultsTable WHERE ch1 IS NULL OR ch2 IS NULL SELECT @nrOfRowsInResultsTable = COUNT(1) FROM @resultsTable SELECT @nrOfMathches * 100 / @nrOfRowsInResultsTable
The
SELECT * FROM @resultsTable
will return the following:ch1 ch2 NULL 5 [blank] [blank] D D O O R R W W
解决方案Ok, here is my solution so far:
SELECT [dbo].[GetPercentageOfTwoStringMatching]('valentin123456' ,'valnetin123456')
returns 86%
CREATE FUNCTION [dbo].[GetPercentageOfTwoStringMatching] ( @string1 NVARCHAR(100) ,@string2 NVARCHAR(100) ) RETURNS INT AS BEGIN DECLARE @levenShteinNumber INT DECLARE @string1Length INT = LEN(@string1) , @string2Length INT = LEN(@string2) DECLARE @maxLengthNumber INT = CASE WHEN @string1Length > @string2Length THEN @string1Length ELSE @string2Length END SELECT @levenShteinNumber = [dbo].[LEVENSHTEIN] ( @string1 ,@string2) DECLARE @percentageOfBadCharacters INT = @levenShteinNumber * 100 / @maxLengthNumber DECLARE @percentageOfGoodCharacters INT = 100 - @percentageOfBadCharacters -- Return the result of the function RETURN @percentageOfGoodCharacters END -- ============================================= -- Create date: 2011.12.14 -- Description: http://blog.sendreallybigfiles.com/2009/06/improved-t-sql-levenshtein-distance.html -- ============================================= CREATE FUNCTION [dbo].[LEVENSHTEIN](@left VARCHAR(100), @right VARCHAR(100)) returns INT AS BEGIN DECLARE @difference INT, @lenRight INT, @lenLeft INT, @leftIndex INT, @rightIndex INT, @left_char CHAR(1), @right_char CHAR(1), @compareLength INT SET @lenLeft = LEN(@left) SET @lenRight = LEN(@right) SET @difference = 0 IF @lenLeft = 0 BEGIN SET @difference = @lenRight GOTO done END IF @lenRight = 0 BEGIN SET @difference = @lenLeft GOTO done END GOTO comparison COMPARISON: IF ( @lenLeft >= @lenRight ) SET @compareLength = @lenLeft ELSE SET @compareLength = @lenRight SET @rightIndex = 1 SET @leftIndex = 1 WHILE @leftIndex <= @compareLength BEGIN SET @left_char = substring(@left, @leftIndex, 1) SET @right_char = substring(@right, @rightIndex, 1) IF @left_char <> @right_char BEGIN -- Would an insertion make them re-align? IF( @left_char = substring(@right, @rightIndex + 1, 1) ) SET @rightIndex = @rightIndex + 1 -- Would an deletion make them re-align? ELSE IF( substring(@left, @leftIndex + 1, 1) = @right_char ) SET @leftIndex = @leftIndex + 1 SET @difference = @difference + 1 END SET @leftIndex = @leftIndex + 1 SET @rightIndex = @rightIndex + 1 END GOTO done DONE: RETURN @difference END
这篇关于T-SQL获取2个字符串的字符匹配百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文