C ++奇怪的结果-蛮力比Rabin-Karp快...? [英] C++ strange results - brute force is quicker than Rabin-Karp...?

查看:67
本文介绍了C ++奇怪的结果-蛮力比Rabin-Karp快...?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前正在为uni模块开发一个字符串搜索程序,并且我已经成功地成功实现了算法,至少在他们能够始终如一地找到字符串的时候.我实现了博耶·摩尔(Boyer Moore)和拉宾·卡普(Rabin Karp).当我的一个同学遇到这个确切的问题时,我也投入了蛮力,并且意识到我也遇到了同样的问题-蛮力比单词表上的Rabin-Karp快.

Currently working on a string search program for a uni module, and I've managed to implement the algorithms successfully, at least to the point where they're finding the string consistently. I implemented Boyer Moore and Rabin Karp. I also threw in a Brute Force when one of my coursemates experienced this exact problem, and realised I had the same issue - that brute force is faster than Rabin-Karp on a wordlist.

Rabin-Karp似乎花费了最多的时间执行滚动哈希,起初我很好奇是否有很多碰撞,但是我设法将碰撞减至3个且具有很大的质数.由于质数的大小,这增加了一些时间,但是很明显滚动哈希是造成此问题的原因.

Rabin-Karp seems to be taking the most amount of time performing the rolling hash, I was curious at first if I was just having a lot of collisions, but I managed to get collisions down to 3 with a huge prime number. That added on a little bit of time I'm assuming due to the size of the prime number, but it seemed pretty clear that rolling hash was causing the problem.

这是滚动哈希部分:

//hashes don't match, rehash using rolling hash to move on to next string section
  if (counter < (stringLength - patternLength)) { 

            stringHash = (MAXCHAR *(stringHash - stringFile[counter] * hash) + stringFile[counter + patternLength]) % prime;


            if (stringHash < 0) {

                stringHash += prime;    //when hash value is negative, make it positive
            }

        }

        if (!found) {

            counter++; 
        }

我想尝试搜索一个巨大的文本文件,所以我使用了Rockyou单词表,Boyer Moore非常满意,而Rabin-Karp花费了不到一秒钟的时间.蛮力花了不到拉宾卡普一半的时间,尽管对我而言这没有意义?

I wanted to try searching through a huge text file so I used the rockyou wordlist, which Boyer Moore is very happy with, and Rabin-Karp is taking under a second. Brute Force is taking less than half the time of Rabin-Karp though which to me just doesn't make sense?

我是否误解了应如何应用这些算法,还是我使用的滚动哈希过程存在问题?

Am I misunderstanding how these algorithms should be applied, or is there a problem with the rolling hash process that I'm using?

推荐答案

蛮力字符串搜索是Rabin-Karp的特例,它具有恒定的哈希函数(因此每个滚动哈希都匹配).

Brute force string search is the special case of Rabin-Karp with a constant hash function (so every rolling hash matches).

两种算法的最坏情况复杂度相同,大多数平均情况"定义的平均情况复杂度也是如此.

The worst case complexity is the same for both algorithms, as is the average case complexity for most definitions of "average case".

在这些情况下,由于计算和检查良好哈希的开销,Rabin-Karp将花费更长的时间.

In these situations, Rabin-Karp will take longer due to the overhead of computing and checking a good hash.

与Rabin-Karp相比,暴力破解的问题在于现实生活中有时会发生恶劣的情况.例如,如果您要搜索路径名,那么您的模式可能会与文件中的许多或大多数路径名和部分路径名共同使用长前缀,这会使蛮力花费很长时间.时间.

The problem with brute force, compared to Rabin-Karp, is that bad cases sometimes occur in real life. If you're searching for pathnames, for example, then it can happen that your pattern has a long prefix in common with many or most of the path names and parts of path names in the file, and that will make brute force take a long time.

使用Rabin-Karp,糟糕的情况在现实生活中极不可能发生.它们实际上仅在对抗"条件下发生,在这种情况下,文件和模式是有目的地构造的,要花很长时间,并且要特别了解您使用的哈希函数.

With Rabin-Karp, the bad cases are very unlikely to occur in real life. They really only happen under "adversarial" conditions, where the file and pattern are constructed purposefully to take a long time, with specific knowledge of the hash function you use.

即使如此... Rabin-Karp对于单模式搜索也不是一个很好的算法.当您同时搜索许多字符串时,它会变得非常有用,并且您可以在潜在匹配项的字典中查找滚动哈希.

Even so... Rabin-Karp is not a great algorithm for single-pattern searching. It becomes much more useful when you're searching for many strings simultaneously and you can look up the rolling hash in a dictionary of potential matches.

这篇关于C ++奇怪的结果-蛮力比Rabin-Karp快...?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆