用PHP加快字符串搜索 [英] speed string search in PHP

查看:250
本文介绍了用PHP加快字符串搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个1.2GB的文件,其中包含一个单行字符串. 我需要搜索整个文件以找到另一个字符串的位置(当前我有要搜索的字符串列表). 我现在的操作方式是打开大文件,将指针移过4Kb块,然后将指针X的位置移回到文件中,再获得4Kb.

I have a 1.2GB file that contains a one line string. What I need is to search the entire file to find the position of an another string (currently I have a list of strings to search). The way what I'm doing it now is opening the big file and move a pointer throught 4Kb blocks, then moving the pointer X positions back in the file and get 4Kb more.

我的问题是要搜索的字符串更大,他花的时间更长.

My problem is that a bigger string to search, a bigger time he take to got it.

您能给我一些优化脚本的想法,以获得更好的搜索时间吗?

Can you give me some ideas to optimize the script to get better search times?

这是我的实现方式

function busca($inici){
        $limit = 4096;

        $big_one    = fopen('big_one.txt','r');
        $options    = fopen('options.txt','r');

        while(!feof($options)){
            $search = trim(fgets($options));
            $retro  = strlen($search);//maybe setting this position absolute? (like 12 or 15)

            $punter = 0;
            while(!feof($big_one)){
                $ara = fgets($big_one,$limit);

                $pos = strpos($ara,$search);
                $ok_pos = $pos + $punter;

                if($pos !== false){
                    echo "$pos - $punter - $search : $ok_pos <br>";
                    break;
                }

                $punter += $limit - $retro;
                fseek($big_one,$punter);
            }
            fseek($big_one,0);
        }
    }

提前谢谢!

推荐答案

为什么不使用exec + grep -b?

exec('grep "new" ext-all-debug.js -b', $result);
// here we have looked for "new" substring entries in the extjs debug src file
var_dump($result);

样本结果:

array(1142) {
    [0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"
    [1]=>  string(54) "3910:var tpl = new Ext.DomHelper.createTemplate(html);"
    ...
}

每个项目都由从文件开头到行本身的字节偏移量(以字节为单位)组成,并以冒号分隔.
因此,在此之后,您必须查看特定行的内部并将位置附加到行偏移处.即:

Each item consists of string offset in bytes from the start of file and the line itself, separated with colon.
So after this you have to look inside the particular line and append the position to the line offset. I.e.:

[0]=>  string(97) "3398: * insert new elements. Revisiting the example above, we could utilize templating this time:"

这意味着在第3408个字节处发现了新"出现(3398是行位置,而10是该行内新"的位置)

this means that "new" occurrence found at 3408th byte (3398 is the line position and 10 is the position of "new" inside this line)

这篇关于用PHP加快字符串搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆