Ruby中的解析器:处理粘性注释和引号 [英] Parser in Ruby: dealing with sticky comments and quotes
问题描述
我正在尝试在Ruby中创建递归下降语法分析器,该语法由以下规则定义
I am trying to make a recursive-descent parser in Ruby for a grammar, which is defined by the following rules
- 输入由空白分隔的卡片组成,以停词开头,
其中空白是正则表达式
/[ \n\t]+/
- 卡可能由关键字或/和值组成,并用空格隔开, 具有卡片特定的订单/图案
- 所有停用词和关键字均不区分大小写,即:
/^[a-z]+[a-z0-9]*$/i
-
值可以是双引号字符串,它不能与 换句话说,用空格隔开,例如:
- Input consists of white-space separated Cards starting with a Stop-word,
where white-space is regex
/[ \n\t]+/
- Card may consist of Keywords or/and Values also separated by white-space, which have card-specific order/pattern
- All Stop-words and Keywords are case-insensitive, i.e.:
/^[a-z]+[a-z0-9]*$/i
Value can be a double-quoted string, which may be not separated from other words by a white-space, e.g.:
word"quoted string"word
值也可以是单词 /^[a-z]+[a-z0-9]*$/
,整数或 float (例如-1.15
或1.0e+2
)
Value can be also a word /^[a-z]+[a-z0-9]*$/
, or integer, or float (e.g. -1.15
, or 1.0e+2
)
单行注释用#
表示,并且不能与
换句话说,例如:
Single-line comment is denoted by #
and may be not separated from
other words, e.g.:
word#single-line comment\n
多行注释由/*
和*/
表示,可能不是
与其他字词分开,例如:
Multi-line comment is denoted by /*
and */
and may be not
separated from other words, e.g.:
word/*multi-line
comment*/word
# Input example. Stop-words are chosen just to highlight them: set, object
set title"Input example"set objects 2#not-separated by white-space. test: "/*
set test "#/*"
object 1 shape box/* shape is a Keyword,
box is a Value. test: "#*/object 2 shape sphere
set data # message and complete are Values
0 0 0 0 1 18 18 18 1 35 35 35 72 35 35 # all numbers are Values of the Card "set"
由于大多数单词都是用空格隔开的,所以有一段时间我一直在考虑拆分整个输入并逐个单词地进行解析.为了处理评论和报价,我要做的
Since most of the words are separated by white-space, for a while I was thinking about splitting the whole input and parsing word-by-word. To deal with comments and quotes, I was going to do
words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )
但是,通过这种方式,可以修改字符串的内容(以及注释,如果我想保留它们的话).您将如何处理这些粘滞的评论和报价?
However, in this way the content of strings (and comments, if I want to keep them) is modified. How would you deal with these sticky comments and quotes?
推荐答案
好的,我自己做的.如果不需要可读性,则可以将以下代码减到最少
OK, I made it myself. One can minimize the following code if its readability is not necessary
class WordParser
attr_reader :words
def initialize text
@text = text
end
def parse
reset_parser
until eof?
case curr_char
when '"' then
start_word and add_chars_until? '"'
close_word
when '#','%' then
start_word and add_chars_until? "\n"
close_word
when '/' then
if next_is? '*' then
start_word and 2.times { add_char }
add_char until curr_is? '*' and next_is? '/' or eof?
2.times { add_char } unless eof?
close_word
else
# parser_error "unexpected symbol '/'" # if not allowed in the grammar
start_word unless word_already_started?
add_char
end
when /[^\s]/ then
start_word unless word_already_started?
add_char
else # skip whitespaces etc. between words
move and close_word
end
end
return @words
end
private
def reset_parser
@position = 0
@line, @column = 1, 1
@words = []
@word_started = false
end
def parser_error s
Kernel.puts 'Parser error on line %d, col %d: ' + s
raise 'Parser error'
end
def word_already_started?
@word_started
end
def close_word
@word_started = false
end
def add_chars_until? ch
add_char until next_is? ch or eof?
2.times { add_char } unless eof?
end
def add_char
@words.last[:to] = @position
# @words.last[:length] += 1
# @word.last += curr_char # if one just collects words
move
end
def start_word
@words.push from: @position, to: @position, line: @line, column: @column
# @words.push '' unless @words.last.empty? # if one just collects words
@word_started = true
end
def move
increase :@position
return if eof?
if prev_is? "\n"
increase :@line
reset :@column
else
increase :@column
end
end
def reset var; instance_variable_set(var, 1) end
def increase var; instance_variable_set(var, instance_variable_get(var)+1) end
def eof?; @position >= @text.length end
def prev_is? ch; prev_char == ch end
def curr_is? ch; curr_char == ch end
def next_is? ch; next_char == ch end
def prev_char; @text[ @position-1 ] end
def curr_char; @text[ @position ] end
def next_char; @text[ @position+1 ] end
end
使用我所遇到的示例进行测试
Test using the example I have in my question
words = WordParser.new(text).parse
p words.collect { |w| text[ w[:from]..w[:to] ] } .to_a
# >> ["# Input example. Stop-words are chosen just to highlight them: set, object\n",
# >> "set", "title", "\"Input example\"", "set", "objects", "2",
# >> "#not-separated by white-space. test: \"/*\n", "set", "test", "\"#/*\"",
# >> "object", "1", "shape", "box", "/* shape is a Keyword, \nbox is a Value. test: \"#*/",
# >> "object", "2", "shape", "sphere", "set", "data", "# message and complete are Values\n",
# >> "0", "0", "0", "0", "1", "18", "18", "18", "1", "35", "35", "35", "72",
# >> "35", "35", "# all numbers are Values of the Card \"set\"\n"]
所以现在我可以使用像这样的来进一步解析单词.
So now I can use something like this to parse the words further.
这篇关于Ruby中的解析器:处理粘性注释和引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!