写一个高效的字符串替换功能? [英] Write an efficient string replacement function?

查看:68
本文介绍了写一个高效的字符串替换功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于学习的目的,我尝试提供一个字符串替换功能,这是我现在所拥有的:

For the purpose of learning I've tried to come up with a string replacement function, here's what I have now:

let string_replace where what replacement =
  let wlength = (String.length what - 1) in
  let length = (String.length where - 1) in
  let rec loop i shift =
    let [shift, i] =
      if wlength == shift then
        (String.blit replacement 0 where (i - wlength) (wlength + 1); [0, i])
      else if (String.get where i == String.get what shift) then
        [shift + 1, i]
      else [0, i - shift]
    in if i < length then loop (i + 1) shift
  in loop 0 0; where;;

现在,这里有一个测试: http://raid6.com.au/~ onlyjob/posts/arena/,所以我想看看我的状况如何.而且,在这里,我编写了测试:

Now, there's a test here: http://raid6.com.au/~onlyjob/posts/arena/ and so I wanted to see how am I doing relatively. And, here, I wrote the test:

let test =
  let initial = "abcdefgh" ^ "efghefgh" in
  let initial_time = Sys.time() in
  let initial_length = String.length initial in
  let imax = (1024 / (String.length initial)) * 1024 * 4 in
  let rec loop s i =
    if i < imax then
      let ns = string_replace (s ^ initial) "efgh" "____" in
      let sl = initial_length * i in
      if (sl mod 1024) * 256 == 0 then
        Printf.printf "%f s | %d Kb |\n" (Sys.time() -. initial_time) (sl / 1024);
      loop ns (i + 1)
  in loop initial 0;;
test;;

我试图尽可能地遵循JavaScript测试,以便能够比较结果(我在此处进行了一些格式化,也为您节省了搜索时间):

I have tried to follow as closely as possible the JavaScript test in order to be able to compare the results (I did some formatting here and also to spare you the searching):

function test() {
    var str = "abcdefgh" + "efghefgh",
    imax = 1024 / str.length * 1024 * 4,
    time = new Date(), gstr = "", i = 0, lngth;

    console.log("exec.tm.sec\tstr.length");

    while (i++ < imax + 1000) {
        gstr += str;
        gstr = gstr.replace(/efgh/g, "____");
        lngth = str.length * i;
        if (lngth % 1024 * 256 === 0) {
            var curdate = new Date();
            console.log((curdate.getTime() - time.getTime()) / 1000 +
                        "sec\t\t" + lngth / 1024 + "kb");
        }
    }
}

这是我得到的结果...

Here are the results I'm getting...

| JavaScript | OCaml        | String size |
|------------+--------------+-------------|
| 0.78 s     | 14.576784 s  | 256 Kb      |
| 3.266 s    | 58.468111 s  | 512 Kb      |
| 7.618 s    | 132.954788 s | 768 Kb      |
| 13.849 s   | 238.793697 s | 1024 Kb     |
| 46.243 s   | 379.175356 s | 1280 Kb     |
| 86.046 s   | 550.838260 s | 1536 Kb     |

所以,我的问题是:我的字符串替换功能有什么特别错误吗?还是可以预期这种速度?我已经用ocamlopt编译了代码.

So, my question: is there anything particularly wrong with my string replacement function? Or is this speed to be expected? I've compiled the code with ocamlopt.

这是我的另一种尝试:

let string_index_of where what =
  let length = (String.length where - 1) in
  let pattern = (1 lsl (String.length what + 1)) - 1 in
  let rec loop i j mask =
    if i == length + 1 then (-1)
    else if mask == pattern then i - j
    else if (where.[i]) == (what.[j]) then
      loop (i + 1) (j + 1) ((mask lsl 1) lor 1)
    else loop (i + 1 - j) 0 1
  in loop 0 0 1;;

let test =
  let initial = "abcdefgh" ^ "efghefgh" in
  let replacement = "____" in
  let replacement_length = String.length replacement in
  let initial_time = Sys.time() in
  let initial_length = String.length initial in
  let imax = (1024 / (String.length initial)) * 1024 * 4 in
  let rec loop s i =
    if i < imax then
      let concatenated = s ^ "efgh" in
      let sl = initial_length * i in
      let rec replace_loop st =
        let index = string_index_of st "efgh" in
        if index >= 0 then
          ((String.blit replacement 0 st index replacement_length);
           replace_loop st)
      in replace_loop concatenated;
      if sl mod (1024 * 256) == 0 then
        Printf.printf "%f s | %d Kb |\n" (Sys.time() -. initial_time) (sl / 1024);
      loop concatenated (i + 1)
  in loop initial 0;;

test;;

我还制作了一个更大的表格(包括其他一些用于比较的语言):

I've also made a larger table (including some other languages for comparison):

| JavaScript V8 | ocamlopt     | ocamlopt bitup | SBCL      | C gcc 4 -O3 | String size |
|---------------+--------------+----------------+-----------+-------------+-------------|
| 0.78 s        | 14.576784 s  | 4.615298 s     | 17.463 s  | 1 s         | 256 Kb      |
| 3.266 s       | 58.468111 s  | 18.474191 s    | 68.484 s  | 4 s         | 512 Kb      |
| 7.618 s       | 132.954788 s | 41.673664 s    | 153.99 s  | 10 s        | 768 Kb      |
| 13.849 s      | 238.793697 s | 74.652651 s    | 275.204 s | 17 s        | 1024 Kb     |
| 46.243 s      | 379.175356 s | 116.517287 s   | 431.768 s | 27 s        | 1280 Kb     |
| 86.046 s      | 550.838260 s | 166.662663 s   | 624.559 s | 38 s        | 1536 Kb     |

我还是希望自己做得更好.

Still I'd hope to do better then this.

推荐答案

我发现很难匹配Javascript性能,大多数引擎(js)使用绳索作为字符串的内部数据结构(

I find it some difficult to match Javascript performance, most of the engines (js) use ropes as internal data structure for strings (https://en.wikipedia.org/wiki/Rope_%28data_structure%29) .

因此,每次执行子字符串,concat或任何应创建新字符串(或在内部调用)的函数时,ocaml都会构造一个新字符串,而javascript使用绳索引用前一个字符串.

So each time you do a substring, concat or any function that should create a new string (or are internally called), ocaml constructs a new string while javascript uses ropes to reference the previous one.

如果您要查看它,请检查每次创建是否使用:( https://github.com/ocaml/ocaml/blob/trunk/stdlib/string.ml ).我想向您展示V8如何优化其重用字符串块的方法,但无法轻松找到它,因此将其作为练习供您参考.

If you want to see it, check everytime create is used: (https://github.com/ocaml/ocaml/blob/trunk/stdlib/string.ml). I wanted to show you how V8 optimizes it reusing string chunks, but couldn't find it easily, so will let it as an exercise for you.

希望它有助于理解它们之间的区别,以及为什么javscript在不必处理太多绳索的情况下仍比C表现更好.

Hope it helped understand the difference and why javscript still performs better than C when it doesn't have to handle much ropes.

这篇关于写一个高效的字符串替换功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆