检查一个字符串中的单词是否在另一个字符串中的最快方法是什么? [英] What's the fastest way to check if a word from one string is in another string?

查看:47
本文介绍了检查一个字符串中的单词是否在另一个字符串中的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一串单词;让我们称它们为 bad:

I have a string of words; let's call them bad:

bad = "foo bar baz"

我可以将此字符串保留为空格分隔的字符串或列表:

I can keep this string as a whitespace separated string, or as a list:

bad = bad.split(" ");

如果我有另一个字符串,就像这样:

If I have another string, like so:

str = "This is my first foo string"

检查 bad 字符串中是否有任何单词在我的比较字符串中的快速方法是什么?如果找到该单词,删除该单词的最快方法是什么?

What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?

#Find if a word is there
bad.split(" ").each do |word|
  found = str.include?(word)
end

#Remove the word
bad.split(" ").each do |word|
  str.gsub!(/#{word}/, "")
end

推荐答案

如果坏词列表很大,哈希会快很多:

If the list of bad words gets huge, a hash is a lot faster:

    require 'benchmark'

    bad = ('aaa'..'zzz').to_a    # 17576 words
    str= "What's the fasted way to check if any word from the bad string is within my "
    str += "comparison string, and what's the fastest way to remove said word if it's "
    str += "found" 
    str *= 10

    badex = /\b(#{bad.join('|')})\b/i

    bad_hash = {}
    bad.each{|w| bad_hash[w] = true}

    n = 10
    Benchmark.bm(10) do |x|

      x.report('regex:') {n.times do 
        str.gsub(badex,'').squeeze(' ')
      end}

      x.report('hash:') {n.times do
        str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
      end}

    end
                user     system      total        real
regex:     10.485000   0.000000  10.485000 ( 13.312500)
hash:       0.000000   0.000000   0.000000 (  0.000000)

这篇关于检查一个字符串中的单词是否在另一个字符串中的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆