中的strstr优化版本(搜索有固定长度) [英] Optimized version of strstr (search has constant length)

查看:369
本文介绍了中的strstr优化版本(搜索有固定长度)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的C程序有很多的功能的strstr调用。标准库是的strstr已经很快,但我的情况下,搜索字符串始终长度的5个字符。我有一个特殊的版本替换它来获得一些速度:

My C program had a lot of strstr function calls. The standard library strstr is already fast but in my case the search string has always length of 5 characters. I replaced it with a special version to gain some speed:


int strstr5(const char *cs, const char *ct)
{
    while (cs[4]) {

        if (cs[0] == ct[0] && cs[1] == ct[1] && cs[2] == ct[2] && cs[3] == ct[3] && cs[4] == ct[4])
            return 1;

        cs++;
    }

    return 0;
}

,因为它是不够的,如果在CS发生克拉知道该函数返回一个整数。我的功能简单,在这种特殊的情况比标准速度快的strstr,但我很感兴趣地听到如果任何人有可能被应用于一些性能改进。即使是很小的改进是值得欢迎的。

The function returns an integer because it’s enough to know if ct occurs in cs. My function is simple and faster than standard strstr in this special case but I’m interested to hear if anybody has some performance improvements that could be applied. Even small improvements are welcome.

摘要:


  • CS有> = 10的长度,但除此之外,它可能会有所不同。长度为之前已知的(在我的函数不使用)。 CS的长度通常为100〜200。

  • CT有5
  • 的长度
  • 字符串的内容可以是任何东西

编辑:感谢您对所有的答案和评论。我要学习和测试的想法,看看有什么效果最好。我将开始与麦氏约后缀特里的想法。

Thank you for all answers and comments. I have to study and test ideas to see what works best. I will start with MAK's idea about suffix trie.

推荐答案

有几个快字符串搜索算法。试着寻找博耶 - 穆尔(如已经由Greg Hewgill建议)的Rabin-Karp 并的 KMP算法

There are several fast string search algorithms. Try looking at Boyer-Moore (as already suggested by Greg Hewgill), Rabin-Karp and KMP algorithms.

如果您需要在同一个大段文字来搜索​​很多小的模式,你也可以尝试推行后缀树后缀阵列。但这些恕我直言有些难于理解和正确执行。

If you need to search for many small patterns in the same large body of text, you can also try implementing a suffix tree or a suffix array. But these are IMHO somewhat harder to understand and implement correctly.

但要注意,这些技术都非常快,但只给你一个AP preciable加速如果所涉及的字符串是非常大的。你可能不会看到一个AP preciable加速字符串小于长说1000个字符。

But beware, these techniques are very fast, but only give you an appreciable speedup if the strings involved are very large. You might not see an appreciable speedup for strings less than say a 1000 characters long.

编辑:

如果您是在相同的文字一遍又一遍地搜索试(即 CS 始终/经常同在调用的值),你会得到一个很大的速度提升通过使用后缀特里(基本上是一个特里后缀)。由于您的文字是小如100或200个字符,你可以使用更简单的为O(n ^ 2)方法来构建线索,然后做就可以了多种快速搜索。每个搜索将只需要5比较而不是通常的5 * 200。

If you are searching on the same text over and over again (i.e. the value of cs is always/often the same across calls), you will get a big speedup by using a suffix trie (Basically a trie of suffixes). Since your text is as small as 100 or 200 characters, you can use the simpler O(n^2) method to build the trie and then do multiple fast searches on it. Each search would require only 5 comparisons instead of the usual 5*200.

编辑2:

正如CAF的评论中提到,C的的strstr 算法的实现依赖。 glibc的使用应该是或多或少快速在实践中任何我所提到的方法的线性时间算法。而在OP的方法是渐近较慢(O(N * M),而不是为O(n)),它是更快可能是由于这样的事实,既n和m(图案和文本的长度)是非常小的,它没有做任何长期preprocessing的glibc的版本。

As mentioned by caf's comment, C's strstr algorithm is implementations dependent. glibc uses a linear time algorithm which should be more or less as fast in practice as any of the methods I've mentioned. While the OP's method is asymptotically slower (O(N*m) instead of O(n) ), it is faster probably due to the fact that both n and m (the lengths of the pattern and the text) are very small and it does not have to do any of the long preprocessing in the glibc version.

这篇关于中的strstr优化版本(搜索有固定长度)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆