Ruby中的解析器:#slice!内部#each_with_index =缺少元素 [英] Parser in Ruby: #slice! inside #each_with_index = missing element

查看:76
本文介绍了Ruby中的解析器:#slice!内部#each_with_index =缺少元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我想从数组中分离出某些元素组合.例如

Let's say, I want to separate certain combinations of elements from an array. For example

data = %w{ start before rgb 255 255 255 between hex FFFFFF after end }
rgb, hex = [], []
data.each_with_index do |v,i|
  p [i,v]
  case v.downcase
    when 'rgb' then rgb  = data.slice! i,4
    when 'hex' then hex  = data.slice! i,2
  end
end
pp [rgb, hex, data]
# >> [0, "start"]
# >> [1, "before"]
# >> [2, "rgb"]
# >> [3, "hex"]
# >> [4, "end"]
# >> [["rgb", "255", "255", "255"],
# >>  ["hex", "FFFFFF"],
# >>  ["start", "before", "between", "after", "end"]]

代码已经完成了正确的提取,但是在提取的集合之后,它错过了元素.所以如果我的数据数组是

The code have done the correct extraction, but it missed the elements just after the extracted sets. So if my data array is

data = %w{ start before rgb 255 255 255 hex FFFFFF after end }

然后

pp [rgb, hex, data]
# >> [["rgb", "255", "255", "255"],
# >>  [],
# >>  ["start", "before", "hex", "FFFFFF", "after", "end"]]

为什么会发生?如何在#each_with_index中获取那些遗漏的元素?还是假设要提取的集合更多,是否有更好的解决方案?

Why does it happen? How to get those missed elements inside #each_with_index? Or may be there is a better solution for this problem assuming that there are much more sets to extract?

推荐答案

问题是您正在变异集合正在对其进行迭代. 可能无法正常工作. (而且我认为不应该.在这种情况下,Ruby应该引发一个异常,而不是静默地允许错误的行为.这几乎是所有其他命令式语言所要做的.)

The problem is that you are mutating the collection while you are iterating over it. This cannot possibly work. (And in my opinion, it shouldn't. Ruby should raise an exception in this case, instead of silently allowing incorrect behavior. That's what pretty much all other imperative languages do.)

这是我能想到的最好的方法,同时又保持了您的原始风格:

This here is the best I could come up with while still keeping your original style:

require 'pp'

data = %w[start before rgb 255 255 255 hex FFFFFF after end]

rgb_count = hex_count = 0

rgb, hex, rest = data.reduce([[], [], []]) do |acc, el|
  acc.tap do |rgb, hex, rest|
    next (rgb_count = 3  ; rgb << el) if /rgb/i =~ el
    next (rgb_count -= 1 ; rgb << el) if rgb_count > 0
    next (hex_count = 1  ; hex << el) if /hex/i =~ el
    next (hex_count -= 1 ; hex << el) if hex_count > 0
    rest << el
  end
end

data.replace(rest)

pp rgb, hex, data
# ["rgb", "255", "255", "255"]
# ["hex", "FFFFFF"]
# ["start", "before", "after", "end"]

但是,您所遇到的是一个解析问题,而解析器应该真正解决该问题.一个简单的手动解析器/状态机可能比上面的代码要多一点,但它的可读性会更高.

However, what you have is a parsing problem and that should really be solved by a parser. A simple hand-rolled parser/state machine will probably be a little bit more code than the above, but it will be so much more readable.

这是一个简单的递归下降解析器,可以解决您的问题:

Here's a simple recursive-descent parser that solves your problem:

class ColorParser
  def initialize(input)
    @input = input.dup
    @rgb, @hex, @data = [], [], []
  end

  def parse
    parse_element until @input.empty?
    return @rgb, @hex, @data
  end

  private

  def parse_element
    parse_color or parse_stop_word
  end

  def parse_color
    parse_rgb or parse_hex
  end

  def parse_rgb
    return unless /rgb/i =~ peek
    @rgb << consume
    parse_rgb_values
  end

我真的很喜欢递归下降解析器,因为它们的结构几乎完全符合语法:只保留解析元素直到输入为空.什么是元素?嗯,这是颜色说明或停用词.什么是颜色规格?嗯,它既可以是RGB颜色规范,也可以是十六进制颜色规范.什么是RGB颜色规范?好吧,它与Regexp /rgb/i和RGB值相匹配.什么是RGB值?好吧,这只是三个数字…

I really like recursive-descent parsers because their structure almost perfectly matches the grammar: just keep parsing elements until the input is empty. What is an element? Well, it's a color specification or a stop word. What is a color specification? Well, it's either an RGB color specification or a hex color specification. What is an RGB color specification? Well, it's something that matches the Regexp /rgb/i followed by RGB values. What are RGB values? Well, it's just three numbers …

  def parse_rgb_values
    3.times do @rgb << consume.to_i end
  end

  def parse_hex
    return unless /hex/i =~ peek
    @hex << consume
    parse_hex_value
  end

  def parse_hex_value
    @hex << consume.to_i(16)
  end

  def parse_stop_word
    @data << consume unless /rgb|hex/i =~ peek
  end

  def consume
    @input.slice!(0)
  end

  def peek
    @input.first
  end
end

像这样使用它:

data = %w[start before rgb 255 255 255 hex FFFFFF after end]
rgb, hex, rest = ColorParser.new(data).parse

require 'pp'

pp rgb, hex, rest
# ["rgb", 255, 255, 255]
# ["hex", 16777215]
# ["start", "before", "after", "end"]

为进行比较,下面是语法:

For comparison, here's the grammar:

  • S 元素 *
  • 元素颜色 | 单词
  • 颜色 rgb | hex
  • rgb rgb rgbvalues
  • rgbvalues 令牌 令牌 令牌
  • hex hex hexvalue
  • hexvalue 令牌
  • 单词令牌
  • Selement*
  • elementcolor | word
  • colorrgb | hex
  • rgbrgb rgbvalues
  • rgbvaluestoken token token
  • hexhex hexvalue
  • hexvaluetoken
  • wordtoken

这篇关于Ruby中的解析器:#slice!内部#each_with_index =缺少元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆