字符串比较时间复杂度 [英] String comparison time complexity

查看:140
本文介绍了字符串比较时间复杂度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哪个比较会花费更长的时间?

Which comparison would take longer time?

a = helloworldhelloworldhelloworld
b = https://www.somerandomurls.com/directory/anotherdirectory/helloworld.html
如果a!= b:doThis()

a =一个,b =两个
如果a!= b:doThis()

我经常需要对照我的具有数千行的数据库进行检查.我没有在寻找任何特定的编程语言.我只是想知道哪个比较需要更快的时间.如您所见,在第一个示例中, b 的值是较长的字符串,而在第二个示例中,则是较短的字符串.因此,我想知道这是否会对比较产生任何影响.

I often need to check this against my database which has thousands of rows. I am not looking for any specific programming language. I just want to know which comparison takes faster. As you see, the value of b is longer string on the first example and shorter on the second example. So I wonder if that might make any difference on comparison.

推荐答案

字符串比较的时间为O(n),n为字符串的长度.

Time for string comparison is O(n), n being the length of the string.

但是,根据测试数据,您可以手动优化匹配算法.我已经提到了一些.

However depending on the test data, you can manually optimize the matching algorithm. I have mentioned a few.

优化1:

检查两个字符串的大小,如果不相等,则返回false.因为这将停止进一步的O(n)比较,并节省时间.通常,字符串数据结构将大小存储在内存中,而不是每次都进行计算.这样一来,O(1)即可访问字符串大小.

Check the size of both the strings, if unequal, return false. As this will stop the further O(n) comparison, and save time. Generally string data structure stores the size in memory, rather than calculating it each time. This allows O(1) time access to the string size.

实际上,这是一个巨大的优化.我将通过计算摊销的时间复杂度来说明如何实现.

Practically this is a huge optimization. I will explain how, by calculating the amortized time complexity.

如果您的字符串数据结构可以包含最大大小为x的字符串,那么总共可能存在(x + 1)个可能的字符串大小(0、1、2,...,x).

If your string data structure can have a string of max size x, then there can be a total of (x + 1) possible string sizes (0, 1, 2, ... , x).

(x + 1)选择2 种选择两个字符串的方式= x *(x + 1)/2

There are (x + 1) choose 2 ways of selecting two strings = x * (x + 1) / 2

如果使用优化1,则仅当两个字符串长度相等时才需要比较整个长度.只有 x + 1 这种情况.完成的操作数将为 0 +1 + 2 + .... + x = x *(x +1)/2 .

If you use optimization 1, then you would need to compare the whole length only when two strings are of equal length. There will be only x + 1 such cases. Number of operations done will be 0 + 1 + 2 + .... + x = x * (x + 1) / 2 .

剩余的(x + 1)*(x-2)/2 个案例将以O(1)时间计算.

Remaining (x + 1) * (x - 2) / 2 cases will be calculated in O(1) time.

因此总计算= x *(x + 1)/2 +(x + 1)*(x-2)/2 =(x + 1)*(x-1)其中是O(n ^ 2).由于我们正在进行x *(x + 1)/2个字符串比较,因此每个比较的摊销时间复杂度为O(1).

Hence total computations = x * (x + 1) / 2 + (x + 1) * (x - 2) / 2 = (x + 1) * (x - 1) which is O(n^2). Since we are doing x * (x + 1) / 2 string comparisons, hence amortized time complexity per comparison is O(1).

如果没有任何优化,就会有

Where as without any optimization, there will be

0 + 1 *(x)* 1 + 2 *(x-1)* 2 + 3 *(x-3)* 3 + .... + x/2 * x/2 * x/2 计算.毫无疑问,这将超过O(n ^ 3).并且摊销时间复杂度将超过O(n).

0 + 1 * (x) * 1 + 2 * (x - 1) * 2 + 3 * (x - 3) * 3 + .... + x/2 * x/2 * x/2 calculations. Which will be without any doubt more than O(n^3). And amortized time complexity will be more than O(n).

优化2:

由于您的数据库包含Web链接,因此它们可能属于同一网站,因此其前几个字符将始终相同.这将导致多余的CPU时间使用.因此,最好从这种情况下从末尾检查,因为相对链接只会与末尾不同.

Since you database contains web links, it is possible that they belong to the same website, hence their first few characters will always be same. This will lead to redundant CPU time usage. Hence better to check from the end for this case, as relative links will differ only from the end.

注意从理论上讲,我们没有开发一种可以改变最坏情况下时间复杂度的算法,它仍然是O(n).我们只是在优化算法.

Note Theoretically speaking, we are not developing an algorithm that will change the worst case time complexity, it is still O(n). We are just optimizing the algorithm.

这篇关于字符串比较时间复杂度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆