单词解析器脚本和实现备忘录 [英] Word parser script and implementing memoization

查看:84
本文介绍了单词解析器脚本和实现备忘录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给一个字典,我的程序生成两个输出文件,"sequences.txt"和"words.txt".

Given a dictionary, my program generates two output files, 'sequences.txt' and 'words.txt'.

  • 序列"包含四个字母(A-z)的每个序列,这些序列恰好出现在字典的一个单词中,每行一个序列.
  • 单词"将包含相应的单词,这些单词包含相同顺序的序列,每行又一个.

例如,给定的spec/fixtures/sample_words.txt字典仅包含

For example, given spec/fixtures/sample_words.txt dictionary containing only

arrows
carrots
give
me

输出应为:

'sequences'             'words'

carr                    carrots
give                    give
rots                    carrots
rows                    arrows
rrot                    carrots
rrow                    arrows

当然,"arro"不会出现在输出中,因为它是 在一个以上的单词中找到.

Of course, 'arro' does not appear in the output since it is found in more than one word.

项目结构:

├── Gemfile
├── Gemfile.lock
├── examples
│   └── dictionary.txt
├── lib
│   └── word_sequence_parser.rb
├── main.rb
├── output
├── readme.md
└── spec
    ├── fixtures
    │   └── sample_words.txt
    └── word_sequence_parser_spec.rb

要运行脚本: ruby main.rb examples/dictionary.txt

main.rb

require_relative 'lib/word_sequence_parser.rb'

dict_path = ARGV.shift

if dict_path.nil?
  dict_path = 'spec/fixtures/sample_words.txt'
end

parser = WordSequenceParser.new(dict_path)

# step 1 - Opens dictionary file and generates a new set of words
parser.set

# step 2 - Parses word sequences
parser.sequence

# step 3 - Prints to files in ./output
parser.dump_text

有效的脚本

word_sequence_parser.rb

require 'set'

class WordSequenceParser

  def initialize(path)
    @path = path
  end

  def set
    set = Set.new

    File.open(@path) do |f|
      f.each_line do |line|
        set.add(line.chomp.downcase)
      end
    end
    set
  end

  def sequence
    sequences = Set.new
    words = Set.new
    to_remove = Set.new

    set.each do |w|
      letters = w.split(//)
      letters.each_cons(4) do |seq|
        s = seq.join
        if !words.add?(s)
          to_remove.add(s)
        end
        sequences.add( {seq: s, word: w} )
      end
    end
    sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
  end

  def dump_text
    output_s = File.open( 'output/sequences.txt', 'w' )
    output_w = File.open( 'output/words.txt', 'w' )

    sequence.each do |hash|
      output_s.puts("#{hash[:seq]}")
      output_w.puts("#{hash[:word]}")
    end

    output_s.close
    output_w.close
  end
end

我对脚本的记忆不足以射击

require 'set'

class WordSequenceParser

  def initialize(path)
    @path = path
  end

  def set
    set = Set.new

    File.open(@path) do |f|
      f.each_line do |line|
        set.add(line.chomp.downcase)
      end
    end
    set
  end

  def memoize
    @set = set
  end

  def sequence
    sequences = Set.new
    words = Set.new
    to_remove = Set.new

    @set.each do |w|
      letters = w.split(//)
      letters.each_cons(4) do |seq|
        s = seq.join
        if !words.add?(s)
          to_remove.add(s)
        end
        sequences.add( {seq: s, word: w} )
      end
    end
    sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
  end

  def dump_text
    output_s = File.open( 'output/sequences.txt', 'w' )
    output_w = File.open( 'output/words.txt', 'w' )

    sequence.each do |hash|
      output_s.puts("#{hash[:seq]}")
      output_w.puts("#{hash[:word]}")
    end

    output_s.close
    output_w.close
  end
end

尝试运行脚本时收到此错误消息.

I get this error message when trying to run the script.

../word_sequence_parser.rb:29:in `sequence': undefined method `each'     for nil:NilClass (NoMethodError)
    from main.rb:15:in `<main>'

我已阅读贾斯汀·魏斯(Justin Weiss)关于记忆的文章,大部分情况下可以理解.只是很难将这种技术实现到我已经写过的东西中.

I've read up on Justin Weiss' article on memoization and for the most part get it. Just having a hard time implementing this technique into something I've already written.

推荐答案

由于您从不调用备忘录,所以它不起作用,因此@set不会被初始化.

It does not work since you never call memoize, so @set is never initialized.

但是这里真正的问题是没有什么要记住的.

However the real problem here, is that there is nothing to memoize.

您的原始代码看起来不错,如果您考虑它的工作原理,那么不会重复执行任何代码.每行执行一次或多次执行的行将返回不同的值.

Your original code looks pretty good, and if you think about how it works there is no redundant execution of any of the code. Every line that is executed either once, or if more than once, returns a different value.

因此记住没有目的.

让我们说您想多次调用dump_text(或只是序列),那么您肯定要记住序列,如下所示:

Lets say however you wanted to call dump_text (or just sequence) multiple times then you would definitely want to memoize sequence as follows:

def sequence
  @sequence ||= begin
    sequences = Set.new
    words = Set.new
    to_remove = Set.new

    set.each do |w|
      letters = w.split(//)
      letters.each_cons(4) do |seq|
        s = seq.join
        if !words.add?(s)
          to_remove.add(s)
        end
        sequences.add( {seq: s, word: w} )
      end
    end
    sequences.delete_if { |hash| to_remove.include?(hash[:seq]) }
  end
end

这将只执行一次原始序列计算代码,然后分配@sequence.其他所有对@sequence的调用都将重用已经计算出的@sequence的值(因为它现在不是nil.)

This will only execute your original sequence calculating code once, then assign @sequence. Every other call to @sequence will reuse the value of @sequence already calculated (because its now not nil.)

我喜欢这个问题,因为这是第一件事,我记得我公司开始使用红宝石的时候.我们有一位顾问重做了许多旧的asp.net代码,他在方法中使用了这些@foo || = ...表达式,这是我以前从未见过的.

I love this question because this was the first thing I remember when my company started using ruby. We had a consultant redoing a lot of old asp.net code, and he had these @foo ||= ... expressions in methods, which I had never seen before.

这篇关于单词解析器脚本和实现备忘录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆