列中的差异数 [英] The number of differences in a column

查看:42
本文介绍了列中的差异数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检索一列,每行中的字母有多少差异.例如

I would like to retrieve a column of how many differences in letters in each row. For instance

如果你有一个值test"而另一行有一个值testing",那么test"和testing"之间的差异是4个字母.该列的数据将为值 4

If you have a a value "test" and another row has a value "testing ", then the differences is 4 letter between "test" and "testing ". The data of the column would be value 4

I have reflected about it and I don't know where to begin

id    ||  value     || category   || differences 
--------------------------------------------------
 1    ||  test      || 1          || 4
 2    ||  testing  || 1          || null   
11    ||  candy     || 2          || -3       
12    ||  ca        || 2          || null      

在这个场景和上下文中,测试"和休息"没有区别.

In this scenario and context it is no difference between "Test" and "rest".

推荐答案

我认为您正在寻找的是 编辑差异,而不仅仅是计算前缀相似度,为此有一些常用算法.Levenshtein 的方法 是我以前使用过的方法,我已经看到它作为 TSQL 函数实现.this SO question 的答案建议了一些 TSQL 中的实现,您可能只是能够按原样获取和使用.

I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.

(尽管花时间测试代码并理解方法,而不是仅仅复制代码并使用它,以便在出现问题时您可以理解输出 - 否则您可能会产生一些技术债务你以后要还钱)

确切地说,您想要哪种距离计算方法取决于您想如何计算某些事物,例如,您是将替换算作一次更改还是将删除和插入算作一次,以及您的字符串是否足够长,可以这样做你想考虑子串移动等等.

Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.

这篇关于列中的差异数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆