用于解析ERB文件的库 [英] Library to parse ERB files
问题描述
我试图以Hpricot/Nokogiri类型的方式解析而不是评估Rails ERB文件.我尝试解析的文件包含HTML片段,这些片段与使用ERB(标准rails视图文件)生成的动态内容混合在一起,我正在寻找一个不仅可以解析周围内容的库,而且可以像Hpricot或Nokogiri一样处理, ERB符号,<%,<%=等,就好像它们是html/xml标签一样.
I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic content generated using ERB (standard rails view files) I am looking for a library that will not only parse the surrounding content, much the way that Hpricot or Nokogiri will but will also treat the ERB symbols, <%, <%= etc, as though they were html/xml tags.
理想情况下,我会得到类似DOM的结构,其中<%,<%=等符号将被包含为它们自己的节点类型.
Ideally I would get back a DOM like structure where the <%, <%= etc symbols would be included as their own node types.
我知道可以使用正则表达式将某些内容合并在一起,但是我在寻找一种更可靠的方法,因为我正在开发一种工具,该工具需要在非常大的视图代码库中运行,其中html内容和erb内容很重要.
I know that it is possible to hack something together using regular expressions but I was looking for something a bit more reliable as I am developing a tool that I need to run on a very large view code base where both the html content and the erb content are important.
例如,诸如以下内容:
blah blah blah
<div>My Great Text <%= my_dynamic_expression %></div>
将返回一个像这样的树结构:
Would return a tree structure like:
root
- text_node (blah blah blah)
- element (div)
- text_node (My Great Text )
- erb_node (<%=)
推荐答案
I eventually ended up solving this problem by using RLex, http://raa.ruby-lang.org/project/ruby-lex/, the ruby version of lex with the following grammer:
%{
#define NUM 257
#define OPTOK 258
#define IDENT 259
#define OPETOK 260
#define CLSTOK 261
#define CLTOK 262
#define FLOAT 263
#define FIXNUM 264
#define WORD 265
#define STRING_DOUBLE_QUOTE 266
#define STRING_SINGLE_QUOTE 267
#define TAG_START 268
#define TAG_END 269
#define TAG_SELF_CONTAINED 270
#define ERB_BLOCK_START 271
#define ERB_BLOCK_END 272
#define ERB_STRING_START 273
#define ERB_STRING_END 274
#define TAG_NO_TEXT_START 275
#define TAG_NO_TEXT_END 276
#define WHITE_SPACE 277
%}
digit [0-9]
blank [ ]
letter [A-Za-z]
name1 [A-Za-z_]
name2 [A-Za-z_0-9]
valid_tag_character [A-Za-z0-9"'=@_():/ ]
ignore_tags style|script
%%
{blank}+"\n" { return [ WHITE_SPACE, yytext ] }
"\n"{blank}+ { return [ WHITE_SPACE, yytext ] }
{blank}+"\n"{blank}+ { return [ WHITE_SPACE, yytext ] }
"\r" { return [ WHITE_SPACE, yytext ] }
"\n" { return[ yytext[0], yytext[0..0] ] };
"\t" { return[ yytext[0], yytext[0..0] ] };
^{blank}+ { return [ WHITE_SPACE, yytext ] }
{blank}+$ { return [ WHITE_SPACE, yytext ] };
"" { return [ TAG_NO_TEXT_START, yytext ] }
"" { return [ TAG_NO_TEXT_END, yytext ] }
"" { return [ TAG_SELF_CONTAINED, yytext ] }
"" { return [ TAG_SELF_CONTAINED, yytext ] }
"" { return [ TAG_START, yytext ] }
"" { return [ TAG_END, yytext ] }
"" { return [ ERB_BLOCK_END, yytext ] }
"" { return [ ERB_STRING_END, yytext ] }
{letter}+ { return [ WORD, yytext ] }
\".*\" { return [ STRING_DOUBLE_QUOTE, yytext ] }
'.*' { return [ STRING_SINGLE_QUOTE, yytext ] }
. { return [ yytext[0], yytext[0..0] ] }
%%
这不是一个完整的语法,但出于我的目的,找到并重新发送文本可以正常工作.我将语法与这小段代码结合在一起:
This is not a complete grammer but for my purposes, locating and re-emitting text, it worked. I combined that grammer with this small piece of code:
text_handler = MakeYourOwnCallbackHandler.new
l = Erblex.new
l.yyin = File.open(file_name, "r")
loop do
a,v = l.yylex
break if a == 0
if( a < WORD )
text_handler.character( v.to_s, a )
else
case a
when WORD
text_handler.text( v.to_s )
when TAG_START
text_handler.start_tag( v.to_s )
when TAG_END
text_handler.end_tag( v.to_s )
when WHITESPACE
text_handler.white_space( v.to_s )
when ERB_BLOCK_START
text_handler.erb_block_start( v.to_s )
when ERB_BLOCK_END
text_handler.erb_block_end( v.to_s )
when ERB_STRING_START
text_handler.erb_string_start( v.to_s )
when ERB_STRING_END
self.text_handler.erb_string_end( v.to_s )
when TAG_NO_TEXT_START
text_handler.ignorable_tag_start( v.to_s )
when TAG_NO_TEXT_END
text_handler.ignorable_tag_end( v.to_s )
when STRING_DOUBLE_QUOTE
text_handler.string_double_quote( v.to_s )
when STRING_SINGLE_QUOTE
text_handler.string_single_quote( v.to_s )
when TAG_SELF_CONTAINED
text_handler.tag_self_contained( v.to_s )
end
end
end
这篇关于用于解析ERB文件的库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!