在 Ruby 中查找句子是否包含特定短语 [英] Finding if a sentence contains a specific phrase in Ruby

查看:36
本文介绍了在 Ruby 中查找句子是否包含特定短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我通过将句子拆分成一个数组然后执行包含以查看它是否包含该单词来查看句子是否包含特定单词.类似的东西:

"这是我很棒的句子.".split(" ").include?('awesome')

但我想知道用短语来做到这一点的最快方法是什么.就像我想看看这是我的真棒句子"这句话.包含短语我真棒句子".我正在抓取句子并比较大量的短语,所以速度有点重要.

解决方案

这里有一些变化:

需要'基准'lorem = ('Lorem ipsum dolor sat amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut Labore et dolore magna aliqua. Ut' # !> 未使用的文字被忽略'enim ad minim veniam, quis nostrud exeritation ullamco Laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in'#!>未使用的文字被忽略'reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.例外 sint occaecat cupidatat non proident,'# !>未使用的文字被忽略'sunt in culpa qui offcia deserunt mollit anim id est Laborum.* 10) <<'富'lorem.split.include?('foo') # =>真的lorem['foo'] # =>富"lorem.include?('foo') # =>真的lorem[/foo/] # =>富"lorem[/fo{2}/] # =>富"lorem[/foo$/] # =>富"lorem[/fo{2}$/] # =>富"lorem[/fo{2}\Z/] # =>富"/foo/.match(lorem)[-1] # =>富"/foo$/.match(lorem)[-1] # =>富"/foo/=~ lorem # =>621n = 500_000放置 RUBY_VERSIONputs "n=#{ n }"Benchmark.bm(25) 做 |x|x.report("数组搜索:") { n.times { lorem.split.include?('foo') } }x.report("字面搜索:") { n.times { lorem['foo'] } }x.report("string include?:") { n.times { lorem.include?('foo') } }x.report("regex:") { n.times { lorem[/foo/] } }x.report("wildcard regex:") { n.times { lorem[/fo{2}/] } }x.report("锚定正则表达式:") { n.times { lorem[/foo$/] } }x.report("锚定通配符正则表达式:") { n.times { lorem[/fo{2}$/] } }x.report("锚定通配符 regex2:") { n.times { lorem[/fo{2}\Z/] } }x.report("/regex/.match") { n.times {/foo/.match(lorem)[-1] } }x.report("/regex$/.match") { n.times {/foo$/.match(lorem)[-1] } }x.report("/regex/=~") { n.times {/foo/=~ lorem } }x.report("/regex$/=~") { n.times {/foo$/=~ lorem } }x.report("/regex\Z/=~") { n.times {/foo\Z/=~ lorem } }结尾

Ruby 1.9.3 的结果:

1.9.3n=500000用户系统总实数数组搜索:12.960000 0.010000 12.970000 ( 12.978311)字面搜索:0.800000 0.000000 0.800000 ( 0.807110)字符串包括?:0.760000 0.000000 0.760000(0.758918)正则表达式:0.660000 0.000000 0.660000 ( 0.657608)通配符正则表达式:0.660000 0.000000 0.660000 ( 0.660296)锚定正则表达式:0.660000 0.000000 0.660000 ( 0.664025)锚定通配符正则表达式:0.660000 0.000000 0.660000 ( 0.664897)锚定通配符 regex2:0.320000 0.000000 0.320000 (0.328876)/regex/.match 1.430000 0.000000 1.430000 ( 1.424602)/regex$/.match 1.430000 0.000000 1.430000 ( 1.434538)/regex/=~ 0.530000 0.000000 0.530000 ( 0.538128)/regex$/=~ 0.540000 0.000000 0.540000 ( 0.536318)/regexZ/=~ 0.210000 0.000000 0.210000 ( 0.214547)

和 1.8.7:

1.8.7n=500000用户系统总实数数组搜索:21.250000 0.000000 21.250000 (21.296039)字面搜索:0.660000 0.000000 0.660000 ( 0.660102)字符串包括?:0.610000 0.000000 0.610000(0.612433)正则表达式:0.950000 0.000000 0.950000 ( 0.946308)通配符正则表达式:2.840000 0.000000 2.840000 ( 2.850198)锚定正则表达式:0.950000 0.000000 0.950000 ( 0.951270)锚定通配符正则表达式:2.870000 0.010000 2.880000 (2.874209)锚定通配符 regex2:2.870000 0.000000 2.870000 (2.868291)/regex/.match 1.470000 0.000000 1.470000 ( 1.479383)/regex$/.match 1.480000 0.000000 1.480000 ( 1.498106)/regex/=~ 0.680000 0.000000 0.680000 ( 0.677444)/regex$/=~ 0.700000 0.000000 0.700000 ( 0.704486)/regexZ/=~ 0.700000 0.000000 0.700000 ( 0.701943)

因此,从结果来看,使用像 'foobar'['foo'] 这样的固定字符串搜索比使用正则表达式 'foobar'[/foo/] 慢>,比等效的 'foobar' =~/foo/ 慢.

OPs 的原始解决方案受到严重影响,因为它遍历字符串两次:一次将其拆分为单个单词,第二次迭代数组以查找实际的目标单词.随着字符串大小的增加,它的性能会下降得更糟.

我觉得 Ruby 的性能有趣的一件事是,锚定的正则表达式比非锚定的正则表达式稍慢.几年前,当我第一次运行这种基准测试时,在 Perl 中情况正好相反.

这是使用 Fruity 的更新版本.不同的表达式返回不同的结果.如果您想查看目标字符串是否存在,则可以使用 Any .如果你想看看这个值是否在字符串的末尾,比如这些正在测试,或者想得到目标的位置,那么一些肯定比其他的快,所以相应地选择.

需要 'fruity'TARGET_STR = (' ' * 100) + 'foo'TARGET_STR['foo'] # =>富"TARGET_STR[/foo/] # =>富"TARGET_STR[/fo{2}/] # =>富"TARGET_STR[/foo$/] # =>富"TARGET_STR[/fo{2}$/] # =>富"TARGET_STR[/fo{2}\Z/] # =>富"TARGET_STR[/fo{2}\z/] # =>富"TARGET_STR[/foo\Z/] # =>富"TARGET_STR[/foo\z/] # =>富"/foo/.match(TARGET_STR)[-1] # =>富"/foo$/.match(TARGET_STR)[-1] # =>富"/foo/=~ TARGET_STR # =>101/foo$/=~ TARGET_STR # =>101/foo\Z/=~ TARGET_STR # =>101TARGET_STR.include?('foo') # =>真的TARGET_STR.index('foo') # =>101TARGET_STR.rindex('foo') # =>101放置 RUBY_VERSIONputs "TARGET_STR.length = #{ TARGET_STR.length }"看跌期权puts '比较固定字符串与未锚定的正则表达式'比较做fixed_str { TARGET_STR['foo'] }unanchored_regex { TARGET_STR[/foo/] }结尾看跌期权将比较/foo/到/fo{2}/"比较做unanchored_regex { TARGET_STR[/foo/] }unanchored_regex2 { TARGET_STR[/fo{2}/] }结尾看跌期权puts '比较非锚定与锚定的正则表达式' # !>已分配但未使用的变量 - 延迟比较做unanchored_regex { TARGET_STR[/foo/] }anchored_regex_dollar { TARGET_STR[/foo$/] }anchored_regex_Z { TARGET_STR[/foo\Z/] }anchored_regex_z { TARGET_STR[/foo\z/] }结尾看跌期权puts 'compare/foo/, match and =~'比较做unanchored_regex { TARGET_STR[/foo/] }unanchored_match {/foo/.match(TARGET_STR)[-1] }unanchored_eq_match {/foo/=~ TARGET_STR }结尾看跌期权puts 'compare fixed, unanchored, Z, include?, index 和 rindex'比较做fixed_str { TARGET_STR['foo'] }unanchored_regex { TARGET_STR[/foo/] }anchored_regex_Z { TARGET_STR[/foo\Z/] }include_eh { TARGET_STR.include?('foo') }_index { TARGET_STR.index('foo') }_rindex { TARGET_STR.rindex('foo') }结尾

结果:

# >>2.2.3#>>TARGET_STR.length = 104#>>#>>比较固定字符串与未锚定的正则表达式#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>fixed_str 比 unanchored_regex 快 2x ± 0.1#>>#>>比较/foo/和/fo{2}/#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>unanchored_regex2 类似于 unanchored_regex#>>#>>比较非锚定与锚定正则表达式#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>anchored_regex_z 类似于 anchored_regex_Z#>>anchored_regex_Z 比 unanchored_regex 快 19.99999999999996% ± 10.0%#>>unanchored_regex 类似于 anchored_regex_dollar#>>#>>比较/foo/,匹配和=~#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>unanchored_eq_match 比 unanchored_regex 快 2x ± 0.1(结果不同:101 vs foo)#>>unanchored_regex 比 unanchored_match 快 3x ± 0.1#>>#>>比较固定的、未锚定的、Z、包括?、索引和 rindex#>>每个测试运行 32768 次.测试大约需要 3 秒钟.#>>_rindex 类似于 include_eh(结果不同:101 vs true)#>>include_eh 比 _index 快 10.000000000000009% ± 10.0%(结果不同:true vs 101)#>>_index 比 fixed_str 快 19.999999999999996% ± 10.0%(结果不同:101 vs foo)#>>fixed_str 比 anchored_regex_Z 快 39.99999999999999% ± 10.0%#>>anchored_regex_Z 类似于 unanchored_regex

修改字符串的大小揭示了一些好东西.

更改为 1,000 个字符:

# >>2.2.3#>>TARGET_STR.length = 1004#>>#>>比较固定字符串与未锚定的正则表达式#>>每个测试运行 4096 次.测试大约需要 1 秒钟.#>>fixed_str 比 unanchored_regex 快 50.0% ± 10.0%#>>#>>比较/foo/和/fo{2}/#>>每个测试运行 2048 次.测试大约需要 1 秒钟.#>>unanchored_regex2 类似于 unanchored_regex#>>#>>比较非锚定与锚定正则表达式#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>anchored_regex_z 比 anchored_regex_Z 快 10.000000000000009% ± 10.0%#>>anchored_regex_Z 比 unanchored_regex 快 3x ± 0.1#>>unanchored_regex 类似于 anchored_regex_dollar#>>#>>比较/foo/,匹配和=~#>>每个测试运行 4096 次.测试大约需要 1 秒钟.#>>unanchored_eq_match 类似于 unanchored_regex(结果不同:1001 vs foo)#>>unanchored_regex 比 unanchored_match 快 2x ± 0.1#>>#>>比较固定的、未锚定的、Z、包括?、索引和 rindex#>>每个测试运行 32768 次.测试大约需要 4 秒钟.#>>_rindex 比 anchored_regex_Z 快 2x ± 1.0(结果不同:1001 vs foo)#>>anchored_regex_Z 比 include_eh 快 2x ± 0.1(结果不同:foo vs true)#>>include_eh 比 fixed_str 快 10.000000000000009% ± 10.0%(结果不同:true vs foo)#>>fixed_str 类似于 _index(结果不同:foo vs 1001)#>>_index 类似于 unanchored_regex(结果不同:1001 vs foo)

将其提高到 10,000:

# >>2.2.3#>>TARGET_STR.length = 10004#>>#>>比较固定字符串与未锚定的正则表达式#>>每个测试运行 512 次.测试大约需要 1 秒钟.#>>fixed_str 比 unanchored_regex 快 39.99999999999999% ± 10.0%#>>#>>比较/foo/和/fo{2}/#>>每个测试运行 256 次.测试大约需要 1 秒钟.#>>unanchored_regex2 类似于 unanchored_regex#>>#>>比较非锚定与锚定正则表达式#>>每个测试运行 8192 次.测试大约需要 3 秒钟.#>>anchored_regex_z 类似于 anchored_regex_Z#>>anchored_regex_Z 比 unanchored_regex 快 21x ± 1.0#>>unanchored_regex 类似于 anchored_regex_dollar#>>#>>比较/foo/,匹配和=~#>>每个测试运行 256 次.测试大约需要 1 秒钟.#>>unanchored_eq_match 类似于 unanchored_regex(结果不同:10001 vs foo)#>>unanchored_regex 比 unanchored_match 快 10.000000000000009% ± 10.0%#>>#>>比较固定的、未锚定的、Z、包括?、索引和 rindex#>>每个测试运行 32768 次.测试大约需要 18 秒.#>>_rindex 比 anchored_regex_Z 快 2x ± 0.1(结果不同:10001 vs foo)#>>anchored_regex_Z 比 include_eh 快 15x ± 1.0(结果不同:foo vs true)#>>include_eh 类似于 _index(结果不同:true vs 10001)#>>_index 类似于 fixed_str(结果不同:10001 vs foo)#>>fixed_str 比 unanchored_regex 快 39.99999999999999% ± 10.0%

Ruby v2.6.5 结果:

# >>2.6.5#>>n=500000#>>用户系统总实数#>>数组搜索:6.744581 0.012204 6.756785 ( 6.766078)#>>字面搜索:0.351014 0.000334 0.351348 ( 0.351866)#>>字符串包括?:0.325576 0.000493 0.326069(0.326331)#>>正则表达式:0.373231 0.000512 0.373743(0.374197)#>>通配符正则表达式:0.371914 0.000356 0.372270 ( 0.372549)#>>锚定正则表达式:0.373606 0.000568 0.374174(0.374736)#>>锚定通配符正则表达式:0.374923 0.000349 0.375272 (0.375729)#>>锚定通配符 regex2:0.136772 0.000384 0.137156 (0.137474)#>>/regex/.match 0.662532 0.003377 0.665909 ( 0.666605)#>>/regex$/.match 0.671762 0.005036 0.676798 ( 0.677691)#>>/regex/=~ 0.322114 0.000404 0.322518 ( 0.322917)#>>/regex$/=~ 0.332067 0.000995 0.333062 ( 0.334226)#>>/regexZ/=~ 0.078958 0.000069 0.079027 ( 0.079082)

和:

# >>2.6.5#>>TARGET_STR.length = 104#>>#>>比较固定字符串与未锚定的正则表达式#>>每个测试运行 32768 次.测试大约需要 1 秒钟.#>>fixed_str 比 unanchored_regex 快 2x ± 0.1#>>#>>比较/foo/和/fo{2}/#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>unanchored_regex 类似于 unanchored_regex2#>>#>>比较非锚定与锚定正则表达式#>>每个测试运行 16384 次.测试大约需要 1 秒钟.#>>anchored_regex_z 类似于 anchored_regex_Z#>>anchored_regex_Z 类似于 anchored_regex_dollar#>>anchored_regex_dollar 类似于 unanchored_regex#>>#>>比较/foo/,匹配和=~#>>每个测试运行 16384 次.测试大约需要 1 秒钟.#>>unanchored_eq_match 类似于 unanchored_regex(结果不同:101 vs foo)#>>unanchored_regex 比 unanchored_match 快 3x ± 1.0(结果不同:foo vs )#>>#>>比较固定的、未锚定的、Z、包括?、索引和 rindex#>>每个测试运行 65536 次.测试大约需要 3 秒钟.#>>_rindex 类似于 include_eh(结果不同:101 vs true)#>>include_eh 类似于 _index(结果不同:true vs 101)#>>_index 类似于 fixed_str(结果不同:101 vs foo)#>>fixed_str 比 anchored_regex_Z 快 2x ± 0.1#>>anchored_regex_Z 比 unanchored_regex 快 19.99999999999996% ± 10.0%

# >>2.6.5#>>TARGET_STR.length = 1004#>>#>>比较固定字符串与未锚定的正则表达式#>>每个测试运行 32768 次.测试大约需要 2 秒钟.#>>fixed_str 比 unanchored_regex 快 7x ± 1.0#>>#>>比较/foo/和/fo{2}/#>>每个测试运行 2048 次.测试大约需要 1 秒钟.#>>unanchored_regex 类似于 unanchored_regex2#>>#>>比较非锚定与锚定正则表达式#>>每个测试运行 8192 次.测试大约需要 1 秒钟.#>>anchored_regex_z 类似于 anchored_regex_Z#>>anchored_regex_Z 比 unanchored_regex 快 3x ± 1.0#>>unanchored_regex 类似于 anchored_regex_dollar#>>#>>比较/foo/,匹配和=~#>>每个测试运行 2048 次.测试大约需要 1 秒钟.#>>unanchored_eq_match 比 unanchored_regex 快 10.000000000000009% ± 10.0%(结果不同:1001 vs foo)#>>unanchored_regex 比 unanchored_match 快 39.99999999999999% ± 10.0%(结果不同:foo vs )#>>#>>比较固定的、未锚定的、Z、包括?、索引和 rindex#>>每个测试运行 65536 次.测试大约需要 4 秒钟.#>>_rindex 类似于 include_eh(结果不同:1001 vs true)#>>include_eh 类似于 _index(结果不同:true vs 1001)#>>_index 类似于 fixed_str(结果不同:1001 vs foo)#>>fixed_str 比 anchored_regex_Z 快 2x ± 1.0#>>anchored_regex_Z 比 unanchored_regex 快 4x ± 1.0

<预><代码>#>>2.6.5#>>TARGET_STR.length = 10004#>>#>>比较固定字符串与未锚定的正则表达式#>>每个测试运行 8192 次.测试大约需要 2 秒钟.#>>fixed_str 比 unanchored_regex 快 31x ± 10.0#>>#>>比较/foo/和/fo{2}/#>>每个测试运行 512 次.测试大约需要 1 秒钟.#>>unanchored_regex2 类似于 unanchored_regex#>>#>>比较非锚定与锚定正则表达式#>>每个测试运行 8192 次.测试大约需要 3 秒钟.#>>anchored_regex_z 类似于 anchored_regex_Z#>>anchored_regex_Z 比 unanchored_regex 快 27x ± 1.0#>>unanchored_regex 类似于 anchored_regex_dollar#>>#>>比较/foo/,匹配和=~#>>每个测试运行 512 次.测试大约需要 1 秒钟.#>>unanchored_eq_match 类似于 unanchored_regex(结果不同:10001 vs foo)#>>unanchored_regex 比 unanchored_match 快 10.000000000000009% ± 10.0%(结果不同:foo vs )#>>#>>比较固定的、未锚定的、Z、包括?、索引和 rindex#>>每个测试运行 65536 次.测试大约需要 14 秒.#>>_rindex 比 _index 快 2x ± 1.0#>>_index 类似于 include_eh(结果不同:10001 vs true)#>>include_eh 类似于 fixed_str(结果不同:true vs foo)#>>fixed_str 类似于 anchored_regex_Z#>>anchored_regex_Z 比 unanchored_regex 快 26x ± 1.0

在字符串中查找子字符串的最佳方法"是相关的.

Right now I am seeing if a sentence contains a specific word by splitting the sentence into an array and then doing an include to see if it contains the word. Something like:

"This is my awesome sentence.".split(" ").include?('awesome')

But I'm wondering what the fastest way to do this with a phrase is. Like if I wanted to see if the sentence "This is my awesome sentence." contains the phrase "my awesome sentence". I am scraping sentences and comparing a very large number of phrases, so speed is somewhat important.

解决方案

Here are some variations:

require 'benchmark'

lorem = ('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut' # !> unused literal ignored
        'enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in' # !> unused literal ignored
        'reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,' # !> unused literal ignored
        'sunt in culpa qui officia deserunt mollit anim id est laborum.' * 10) << ' foo'


lorem.split.include?('foo') # => true
lorem['foo']                # => "foo"
lorem.include?('foo')       # => true
lorem[/foo/]                # => "foo"
lorem[/fo{2}/]              # => "foo"
lorem[/foo$/]               # => "foo"
lorem[/fo{2}$/]             # => "foo"
lorem[/fo{2}\Z/]            # => "foo"
/foo/.match(lorem)[-1]      # => "foo"
/foo$/.match(lorem)[-1]     # => "foo"
/foo/ =~ lorem              # => 621

n = 500_000

puts RUBY_VERSION
puts "n=#{ n }"
Benchmark.bm(25) do |x|
  x.report("array search:")             { n.times { lorem.split.include?('foo') } }
  x.report("literal search:")           { n.times { lorem['foo']                } }
  x.report("string include?:")          { n.times { lorem.include?('foo')       } }
  x.report("regex:")                    { n.times { lorem[/foo/]                } }
  x.report("wildcard regex:")           { n.times { lorem[/fo{2}/]              } }
  x.report("anchored regex:")           { n.times { lorem[/foo$/]               } }
  x.report("anchored wildcard regex:")  { n.times { lorem[/fo{2}$/]             } }
  x.report("anchored wildcard regex2:") { n.times { lorem[/fo{2}\Z/]            } }
  x.report("/regex/.match")             { n.times { /foo/.match(lorem)[-1]      } }
  x.report("/regex$/.match")            { n.times { /foo$/.match(lorem)[-1]     } }
  x.report("/regex/ =~")                { n.times { /foo/ =~ lorem              } }
  x.report("/regex$/ =~")               { n.times { /foo$/ =~ lorem             } }
  x.report("/regex\Z/ =~")              { n.times { /foo\Z/ =~ lorem            } }
end

And the results for Ruby 1.9.3:

1.9.3
n=500000
                                user     system      total        real
array search:              12.960000   0.010000  12.970000 ( 12.978311)
literal search:             0.800000   0.000000   0.800000 (  0.807110)
string include?:            0.760000   0.000000   0.760000 (  0.758918)
regex:                      0.660000   0.000000   0.660000 (  0.657608)
wildcard regex:             0.660000   0.000000   0.660000 (  0.660296)
anchored regex:             0.660000   0.000000   0.660000 (  0.664025)
anchored wildcard regex:    0.660000   0.000000   0.660000 (  0.664897)
anchored wildcard regex2:   0.320000   0.000000   0.320000 (  0.328876)
/regex/.match               1.430000   0.000000   1.430000 (  1.424602)
/regex$/.match              1.430000   0.000000   1.430000 (  1.434538)
/regex/ =~                  0.530000   0.000000   0.530000 (  0.538128)
/regex$/ =~                 0.540000   0.000000   0.540000 (  0.536318)
/regexZ/ =~                 0.210000   0.000000   0.210000 (  0.214547)

And 1.8.7:

1.8.7
n=500000
                               user     system      total        real
array search:             21.250000   0.000000  21.250000 ( 21.296039)
literal search:            0.660000   0.000000   0.660000 (  0.660102)
string include?:           0.610000   0.000000   0.610000 (  0.612433)
regex:                     0.950000   0.000000   0.950000 (  0.946308)
wildcard regex:            2.840000   0.000000   2.840000 (  2.850198)
anchored regex:            0.950000   0.000000   0.950000 (  0.951270)
anchored wildcard regex:   2.870000   0.010000   2.880000 (  2.874209)
anchored wildcard regex2:  2.870000   0.000000   2.870000 (  2.868291)
/regex/.match              1.470000   0.000000   1.470000 (  1.479383)
/regex$/.match             1.480000   0.000000   1.480000 (  1.498106)
/regex/ =~                 0.680000   0.000000   0.680000 (  0.677444)
/regex$/ =~                0.700000   0.000000   0.700000 (  0.704486)
/regexZ/ =~                0.700000   0.000000   0.700000 (  0.701943)

So, from the results, using a fixed string search like 'foobar'['foo'] is slower than using a regex 'foobar'[/foo/], which slower than the equivalent 'foobar' =~ /foo/.

The OPs original solution suffers badly because it traverses the string twice: Once to split it into individual words, and a second time iterating the array looking for the actual target word. Its performance will degrade worse as the string size increases.

One thing I find interesting about the performance of Ruby, is that an anchored regex is slightly slower than unanchored regex. In Perl, the opposite was true when I first ran this sort of benchmark, several years ago.

Here's an updated version using Fruity. The various expressions return different results. Any could be used if you want to see whether the target string exists. If you want to see whether the value is at the end of the string, like these are testing, or to get the location of the target, then some are definitely faster than others so pick accordingly.

require 'fruity'

TARGET_STR = (' ' * 100) + ' foo'

TARGET_STR['foo']            # => "foo"
TARGET_STR[/foo/]            # => "foo"
TARGET_STR[/fo{2}/]          # => "foo"
TARGET_STR[/foo$/]           # => "foo"
TARGET_STR[/fo{2}$/]         # => "foo"
TARGET_STR[/fo{2}\Z/]        # => "foo"
TARGET_STR[/fo{2}\z/]        # => "foo"
TARGET_STR[/foo\Z/]          # => "foo"
TARGET_STR[/foo\z/]          # => "foo"
/foo/.match(TARGET_STR)[-1]  # => "foo"
/foo$/.match(TARGET_STR)[-1] # => "foo"
/foo/ =~ TARGET_STR          # => 101
/foo$/ =~ TARGET_STR         # => 101
/foo\Z/ =~ TARGET_STR        # => 101
TARGET_STR.include?('foo')   # => true
TARGET_STR.index('foo')      # => 101
TARGET_STR.rindex('foo')     # => 101


puts RUBY_VERSION
puts "TARGET_STR.length = #{ TARGET_STR.length }"

puts
puts 'compare fixed string vs. unanchored regex'
compare do 
  fixed_str        { TARGET_STR['foo'] }
  unanchored_regex { TARGET_STR[/foo/] }
end

puts
puts 'compare /foo/ to /fo{2}/'
compare do
  unanchored_regex  { TARGET_STR[/foo/]   }
  unanchored_regex2 { TARGET_STR[/fo{2}/] }
end

puts
puts 'compare unanchored vs. anchored regex' # !> assigned but unused variable - delay
compare do 
  unanchored_regex      { TARGET_STR[/foo/]    }
  anchored_regex_dollar { TARGET_STR[/foo$/]   }
  anchored_regex_Z      { TARGET_STR[/foo\Z/] }
  anchored_regex_z      { TARGET_STR[/foo\z/] }
end

puts
puts 'compare /foo/, match and =~'
compare do
  unanchored_regex    { TARGET_STR[/foo/]           }
  unanchored_match    { /foo/.match(TARGET_STR)[-1] }
  unanchored_eq_match { /foo/ =~ TARGET_STR         }
end

puts
puts 'compare fixed, unanchored, Z, include?, index and rindex'
compare do
  fixed_str        { TARGET_STR['foo']          }
  unanchored_regex { TARGET_STR[/foo/]          }
  anchored_regex_Z { TARGET_STR[/foo\Z/]        }
  include_eh       { TARGET_STR.include?('foo') }
  _index           { TARGET_STR.index('foo')    }
  _rindex          { TARGET_STR.rindex('foo')   }
end

Which results in:

# >> 2.2.3
# >> TARGET_STR.length = 104
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 2x ± 0.1
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 8192 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 19.999999999999996% ± 10.0%
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 8192 times. Test will take about 1 second.
# >> unanchored_eq_match is faster than unanchored_regex by 2x ± 0.1 (results differ: 101 vs foo)
# >> unanchored_regex is faster than unanchored_match by 3x ± 0.1
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 32768 times. Test will take about 3 seconds.
# >> _rindex is similar to include_eh (results differ: 101 vs true)
# >> include_eh is faster than _index by 10.000000000000009% ± 10.0% (results differ: true vs 101)
# >> _index is faster than fixed_str by 19.999999999999996% ± 10.0% (results differ: 101 vs foo)
# >> fixed_str is faster than anchored_regex_Z by 39.99999999999999% ± 10.0%
# >> anchored_regex_Z is similar to unanchored_regex

Modifying the size of the string reveals good stuff to know.

Changing to 1,000 characters:

# >> 2.2.3
# >> TARGET_STR.length = 1004
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 4096 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 50.0% ± 10.0%
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 2048 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> anchored_regex_z is faster than anchored_regex_Z by 10.000000000000009% ± 10.0%
# >> anchored_regex_Z is faster than unanchored_regex by 3x ± 0.1
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 4096 times. Test will take about 1 second.
# >> unanchored_eq_match is similar to unanchored_regex (results differ: 1001 vs foo)
# >> unanchored_regex is faster than unanchored_match by 2x ± 0.1
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 32768 times. Test will take about 4 seconds.
# >> _rindex is faster than anchored_regex_Z by 2x ± 1.0 (results differ: 1001 vs foo)
# >> anchored_regex_Z is faster than include_eh by 2x ± 0.1 (results differ: foo vs true)
# >> include_eh is faster than fixed_str by 10.000000000000009% ± 10.0% (results differ: true vs foo)
# >> fixed_str is similar to _index (results differ: foo vs 1001)
# >> _index is similar to unanchored_regex (results differ: 1001 vs foo)

Bumping it to 10,000:

# >> 2.2.3
# >> TARGET_STR.length = 10004
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 512 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 39.99999999999999% ± 10.0%
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 256 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 3 seconds.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 21x ± 1.0
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 256 times. Test will take about 1 second.
# >> unanchored_eq_match is similar to unanchored_regex (results differ: 10001 vs foo)
# >> unanchored_regex is faster than unanchored_match by 10.000000000000009% ± 10.0%
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 32768 times. Test will take about 18 seconds.
# >> _rindex is faster than anchored_regex_Z by 2x ± 0.1 (results differ: 10001 vs foo)
# >> anchored_regex_Z is faster than include_eh by 15x ± 1.0 (results differ: foo vs true)
# >> include_eh is similar to _index (results differ: true vs 10001)
# >> _index is similar to fixed_str (results differ: 10001 vs foo)
# >> fixed_str is faster than unanchored_regex by 39.99999999999999% ± 10.0%

Ruby v2.6.5 results:

# >> 2.6.5
# >> n=500000
# >>                                 user     system      total        real
# >> array search:               6.744581   0.012204   6.756785 (  6.766078)
# >> literal search:             0.351014   0.000334   0.351348 (  0.351866)
# >> string include?:            0.325576   0.000493   0.326069 (  0.326331)
# >> regex:                      0.373231   0.000512   0.373743 (  0.374197)
# >> wildcard regex:             0.371914   0.000356   0.372270 (  0.372549)
# >> anchored regex:             0.373606   0.000568   0.374174 (  0.374736)
# >> anchored wildcard regex:    0.374923   0.000349   0.375272 (  0.375729)
# >> anchored wildcard regex2:   0.136772   0.000384   0.137156 (  0.137474)
# >> /regex/.match               0.662532   0.003377   0.665909 (  0.666605)
# >> /regex$/.match              0.671762   0.005036   0.676798 (  0.677691)
# >> /regex/ =~                  0.322114   0.000404   0.322518 (  0.322917)
# >> /regex$/ =~                 0.332067   0.000995   0.333062 (  0.334226)
# >> /regexZ/ =~                 0.078958   0.000069   0.079027 (  0.079082)

and:

# >> 2.6.5
# >> TARGET_STR.length = 104
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 32768 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 2x ± 0.1
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 8192 times. Test will take about 1 second.
# >> unanchored_regex is similar to unanchored_regex2
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 16384 times. Test will take about 1 second.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is similar to anchored_regex_dollar
# >> anchored_regex_dollar is similar to unanchored_regex
# >> 
# >> compare /foo/, match and =~
# >> Running each test 16384 times. Test will take about 1 second.
# >> unanchored_eq_match is similar to unanchored_regex (results differ: 101 vs foo)
# >> unanchored_regex is faster than unanchored_match by 3x ± 1.0 (results differ: foo vs )
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 65536 times. Test will take about 3 seconds.
# >> _rindex is similar to include_eh (results differ: 101 vs true)
# >> include_eh is similar to _index (results differ: true vs 101)
# >> _index is similar to fixed_str (results differ: 101 vs foo)
# >> fixed_str is faster than anchored_regex_Z by 2x ± 0.1
# >> anchored_regex_Z is faster than unanchored_regex by 19.999999999999996% ± 10.0%

# >> 2.6.5
# >> TARGET_STR.length = 1004
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 32768 times. Test will take about 2 seconds.
# >> fixed_str is faster than unanchored_regex by 7x ± 1.0
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 2048 times. Test will take about 1 second.
# >> unanchored_regex is similar to unanchored_regex2
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 3x ± 1.0
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 2048 times. Test will take about 1 second.
# >> unanchored_eq_match is faster than unanchored_regex by 10.000000000000009% ± 10.0% (results differ: 1001 vs foo)
# >> unanchored_regex is faster than unanchored_match by 39.99999999999999% ± 10.0% (results differ: foo vs )
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 65536 times. Test will take about 4 seconds.
# >> _rindex is similar to include_eh (results differ: 1001 vs true)
# >> include_eh is similar to _index (results differ: true vs 1001)
# >> _index is similar to fixed_str (results differ: 1001 vs foo)
# >> fixed_str is faster than anchored_regex_Z by 2x ± 1.0
# >> anchored_regex_Z is faster than unanchored_regex by 4x ± 1.0


# >> 2.6.5
# >> TARGET_STR.length = 10004
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 8192 times. Test will take about 2 seconds.
# >> fixed_str is faster than unanchored_regex by 31x ± 10.0
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 512 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 3 seconds.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 27x ± 1.0
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 512 times. Test will take about 1 second.
# >> unanchored_eq_match is similar to unanchored_regex (results differ: 10001 vs foo)
# >> unanchored_regex is faster than unanchored_match by 10.000000000000009% ± 10.0% (results differ: foo vs )
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 65536 times. Test will take about 14 seconds.
# >> _rindex is faster than _index by 2x ± 1.0
# >> _index is similar to include_eh (results differ: 10001 vs true)
# >> include_eh is similar to fixed_str (results differ: true vs foo)
# >> fixed_str is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 26x ± 1.0

"Best way to find a substring in a string" is related.

这篇关于在 Ruby 中查找句子是否包含特定短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆