通过语法解析AST(或.y + .lang => xml)的工具 [英] Tool to parse by grammar to AST (or .y+.lang => xml)

查看:311
本文介绍了通过语法解析AST(或.y + .lang => xml)的工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个词法分析器定义文件,一个语法文件(例如,postgresql .y .l flex和bison程序从它的源代码树),以及由这些词法分析器和解析器(比如一个SQL查询)定义的文件,以获得一些标准形式的AST(比如XML的JSON)。

Given a lexer definition file, a grammar file (say, postgresql .y,.l flex and bison programs from it's source tree), and a file defined by those lexer and parser (say, an SQL query) to get the AST in some standard form (say, JSON of XML).

这个工具最重要的方面是 - 输入格式的灵活性。在我的例子中,我可以在ANTLR中重建postgres SQL语法 - 但我不想。我宁愿使用postgres使用的任何东西。因此,即使 .y 文件包含的解析规则不止一个 - 我正在寻找的工具将能够通过细微的修改来理解它们。

The most important aspect of this tool is - flexibility of the input format. In my example, I could recreate postgres SQL grammar in ANTLR - but I don't want to. I'd rather just use whatever postgres is using. So even though .y file contains more than the parsing rules - the tool that I'm looking for will be able to understand them with minor modifications.

这是一个通用的工具吗?

Is there a generic tool that does that?

这是一个命令行会话与我想象的工具 ly2xml

Here's a command line session with my imaginary tool ly2xml:

$ git clone git://postgres-git-url pg
$ find pg -iname *.[yl] -exec cp '{}' ~/ \;
$ echo 'SELECT * FROM (SELECT 1)'|ly2xml -parser=*.y -lexer=*.l - -O-
<SELECT>
  <ARGS>*</ARGS>
  <FROM>
    <SELECT><ARGS>1</ARGS></SELECT>
  </FROM>
</SELECT>

(请注意 - 标准输入和 -O - 意味着它写入标准输出)。

(note that - means it reads from standard input, and -O- means it writes to standard output).

推荐答案

很好的想法。您假设以下一个或多个:

Nice thought. You're assuming one or more of:

 a) that each tool that has a grammar, uses a canonical parsing engine type (e.g., everybody uses bison)
 b) that there is some parsing tool that understands the zillion grammar specification schemes that exist
 c) that whatever the parser is, it will parse language fragments (perhaps well formed).

a)显然是假的。我从来没有见过b)。实际上没有一个解析引擎做c);他们只能解析完整程序。

a) is clearly false. I've never seen b). Practically none of the parsing engines do c); they can only parse "full programs".

您唯一的希望IMHO是使用具有大量经过良好测试的语言定义的解析器生成器。

Your only hope IMHO is to use a parser generator that has a large number of well tested language definitions.

ANTLR 可以说是一个;它当然有一个长的语言定义列表。他们都可以在一个地方找到。不做语言片段,虽然,我知道。怀疑它是否有XML导出所有解析树。

ANTLR is arguably one; it certainly has a long list of contributed language definitions. And they're all sort of findable in one place. Doesn't do language fragments, though, that I know of. Doubt if it has XML export for all parse trees.

Bison 可以说是一个;有很多很多的语言处理器使用Bison构建。但是定义分散到处都是,很难收集它们。也不做语言片段。很肯定它没有XML导出。

Bison is arguably one; there are lots and lots of language processors built using Bison. But the definitions are scattered everywhere and it will be very hard to collect them. Also doesn't do language fragments. Pretty sure it doesn't have XML export.

我们的 DMS软件重组工具包可以说是一个。有很多语言定义。他们都收集在一个地方(我们公司)。它为每个解析产生AST,并且具有内置的XML导出。 DMS还可以解析任何语言的非终结语言,它知道任何语言。

Our DMS Software Reengineering Toolkit is arguably one. Has lots of language definitions. They're all collected in one place (our company). It does produce ASTs for every parse, and does have built-in XML export. DMS also can parse any language nonterminal for any language it knows.

DMS可以很好地模拟你的例子,给定一个DMS .lex,.atg和一个兼容的源文件。

DMS can simulate your example pretty well, given a DMS .lex, .atg ("attributed grammar") and a compatible source file.

下面是一个DMS词法分析器/解析器构建和运行,XML导出,代理语法在代数为DMS域
++ XML 示例是要求导出XML的解析步骤):

What follows is a DMS lexer/parser-build and run, with XML export, for the Algebra grammar found at Algebra as DMS Domain (the ++XML halfway down the example is the parsing step being told to export XML):

C:\DMS\Domains\Algebra\Tools\Parser\Source>make
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -lexer
MakeDMSTool: Selected domain "Algebra".
LexerGenerator V2.1a
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved
Parsing lexical specification ...
Processing mode Algebra ...
Exiting with final status 0
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -tool %Temporaries
MakeDMSTool: Selected domain "Algebra".
Using attribute grammar in "/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/Syntax/Algebra.atg"
AttributeEvaluatorGenerator V3.0
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved
Parsing attribute grammar ...
Generating attribute evaluator(s) ...
Exiting with final status 0

rm -rf /cygdrive/c/DMS/Domains/Algebra/Tools/%Temporaries
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -prettyprinter
MakeDMSTool: Selected domain "Algebra".
PrettyPrinterGenerator V2.0
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved

Parsing pretty printer specification ...
Generating pretty printer ...
Exiting with final status 0

AttributeEvaluatorGenerator V3.0
Copyright (c) 1999-2010 Semantic Designs, Inc.; All Rights Reserved
Parsing attribute grammar ...
Generating attribute evaluator(s) ...
......................

Exiting with final status 0
cd /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/\%Generated; \
    perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -weave-preserve-productions %PreserveProductions.*.par
MakeDMSTool: Selected domain "Algebra".
perl /cygdrive/c/DMS/Executables/MakeDMSTool Algebra -parser
MakeDMSTool: Selected domain "Algebra".
export PARLANSEINCLUDEDIRECTORIES=`perl -e '($_ = $ARGV[0].";/cygdrive/c/DMS/Domains/PARLANSE/Library/Arrays;/cygdrive/c/DMS/Domains
/PARLANSE/Library/Bags;/cygdrive/c/DMS/Domains/PARLANSE/Library/HashTables;/cygdrive/c/DMS/Domains/PARLANSE/Library/Pipes;/cygdrive/
c/DMS/Domains/PARLANSE/Library/Sequences;/cygdrive/c/DMS/Domains/PARLANSE/Library/Sets;/cygdrive/c/DMS/Domains/PARLANSE/Library/Stac
ks;/cygdrive/c/DMS/Domains/PARLANSE/Library/Utilities;/cygdrive/c/DMS/Domains/PARLANSE/Library/Algorithms/Source;/cygdrive/c/DMS/Dom
ains/PARLANSE/Library/Booleans/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Characters/Source;/cygdrive/c/DMS/Domains/PARLANSE/Li
brary/Graphics/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/HashTrees/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Numbers/Sou
rce;/cygdrive/c/DMS/Domains/PARLANSE/Library/References/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/SQL/Source;/cygdrive/c/DMS/D
omains/PARLANSE/Library/Streams/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/SuffixTrees/Source;/cygdrive/c/DMS/Domains/PARLANSE/
Library/System/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/Search/Source;/cygdrive/c/DMS/Domains/PARLANSE/Library/TestSupport/So
urce") =~ s!//(.)/!$1:/!g; $_ =~ s!/cygdrive/(.)/!$1:/!g; print $_' "/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source;/cygdrive/c
/DMS/Domains/Algebra/Tools/Parser/Source/Components;/cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/%Generated;/cygdrive/c/DMS/D
omains/DMSStringGrammar/Tools/DomainParser/Source;/cygdrive/c/DMS/Domains/Algebra/Tools/Lexer/Source;/cygdrive/c/DMS/Domains/Algebra
/Tools/Lexer/Source/%Generated;/cygdrive/c/DMS/Domains/DMSLexical/Tools/DomainLexer/Source;/cygdrive/c/DMS/Infrastructure/HyperGraph
/Source;/cygdrive/c/DMS/Domains"`; \
    cd `echo /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source`; \
    nice /cygdrive/c/DMS/Domains/PARLANSE/Tools/Compiler/p0c.exe  DomainParser.par
PARLANSE0 Compiler V19.16.40
Semantic Designs, Inc. *** Confidential Information
128/485/133408 smallest/average/largest activation record/grain stack space required.
Largest stack space required by function at Line    1533
 in file FFIModule.par
89 grains.
3775 functions/procedures.
223447 lines of source code read.
7160772 bytes of object code.
No errors detected.
mv -f /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/Source/DomainParser.P0B /cygdrive/c/DMS/Domains/Algebra/Tools/Parser/DomainParser
.P0B

C:\DMS\Domains\Algebra\Tools\Parser\Source>run ../DomainParser ++XML C:\DMS\Domains\Algebra\Tools\Lexer\TestCase\algebraformula.txt
Domain Parser for Algebra 2.3.3
Copyright (C) Semantic Designs 1996-2010; All Rights Reserved
31 tree nodes in tree.
<DMSForest>
 <tree node="formula" type="1" domain="1" id="10qx0" parents="0" line="1" column="1" file="1">
  <tree node="product" type="4" domain="1" id="10qwx" line="1" column="1" file="1">
   <tree node="term" type="10" domain="1" id="10qwy" line="1" column="1" file="1">
<tree node="'D'" type="19" domain="1" id="10qw5" literal="0" line="1" column="1" file="1"/>
<tree node="'['" type="20" domain="1" id="10qw6" literal="0" line="1" column="2" file="1"/>
<tree node="formula" type="1" domain="1" id="10qwt" line="1" column="4" file="1">
 <tree node="product" type="4" domain="1" id="10qws" line="1" column="4" file="1">
  <tree node="term" type="9" domain="1" id="10qwr" line="1" column="4" file="1">
   <tree node="'('" type="17" domain="1" id="10qw7" literal="0" line="1" column="4" file="1"/>
   <tree node="formula" type="3" domain="1" id="10qwp" line="1" column="5" file="1">
    <tree node="formula" type="2" domain="1" id="10qwk" line="1" column="5" file="1">
     <tree node="formula" type="1" domain="1" id="10qwf" line="1" column="5" file="1">
      <tree node="product" type="5" domain="1" id="10qwe" line="1" column="5" file="1">
       <tree node="product" type="4" domain="1" id="10qwa" line="1" column="5" file="1">
    <tree node="term" type="7" domain="1" id="10qw9" line="1" column="5" file="1">
     <tree node="VARIABLE" type="15" domain="1" id="10qw8" line="1" column="5" file="1">
      <literal>x</literal>
     </tree>
    </tree>
       </tree>
       <tree node="'*'" type="13" domain="1" id="10qwb" literal="0" line="1" column="7" file="1"/>
       <tree node="term" type="8" domain="1" id="10qwd" line="1" column="8" file="1">
    <tree node="NUMBER" type="16" domain="1" id="10qwc" literal="23" line="1" column="8" file="1"/>
       </tree>
      </tree>
     </tree>
     <tree node="'+'" type="11" domain="1" id="10qwg" literal="0" line="1" column="10" file="1"/>
     <tree node="product" type="4" domain="1" id="10qwj" line="1" column="12" file="1">
      <tree node="term" type="7" domain="1" id="10qwi" line="1" column="12" file="1">
       <tree node="VARIABLE" type="15" domain="1" id="10qwh" line="1" column="12" file="1">
    <literal>y</literal>
       </tree>
      </tree>
     </tree>
    </tree>
    <tree node="'-'" type="12" domain="1" id="10qwl" literal="0" line="1" column="13" file="1"/>
    <tree node="product" type="4" domain="1" id="10qwo" line="1" column="14" file="1">
     <tree node="term" type="7" domain="1" id="10qwn" line="1" column="14" file="1">
      <tree node="VARIABLE" type="15" domain="1" id="10qwm" line="1" column="14" file="1">
       <literal>z</literal>
      </tree>
     </tree>
    </tree>
   </tree>
   <tree node="')'" type="18" domain="1" id="10qwq" literal="0" line="1" column="15" file="1"/>
  </tree>
 </tree>
</tree>
<tree node="','" type="21" domain="1" id="10qwu" literal="0" line="1" column="16" file="1"/>
<tree node="VARIABLE" type="15" domain="1" id="10qwv" line="1" column="18" file="1">
 <literal>x</literal>
</tree>
<tree node="']'" type="22" domain="1" id="10qww" literal="0" line="1" column="19" file="1"/>
   </tree>
  </tree>
 </tree>
 <FileIndex>
  <File index="1">C:/DMS/Domains/Algebra/Tools/Lexer/TestCase/algebraformula.txt</File>
 </FileIndex>
 <DomainIndex>
  <Domain index="1">Algebra</Domain>
 </DomainIndex>
</DMSForest>
Exiting with final status 0

C:\DMS\Domains\Algebra\Tools\Parser\Source>

如果你真的想要一个理解许多语法符号的引擎,最容易用DMS构建这样的发动机。简单地将每个语法形式主义(例如,ANTLR或bison)定义为到DSL的DSL,使用DMS解析特定语法形式主义实例(例如,ANLTR bnf实例),应用DMS重写规则将其转换为DMS语法,然后构建一个DMS解析器。 (你也必须对lexer做同样的事。)。

If you really wanted an engine that understood many grammar notations, it might be easiest to build such an engine with DMS. Simply define each of the grammar formalisms (e.g., ANTLR or bison) as a DSL to DMS, parse a specific grammar formalism instance (e.g., ANLTR bnf instance) using DMS, apply DMS rewrite rules to transform that to a DMS grammar, and then build a DMS parser. (You'd have to do the same with the lexer, too.).

这篇关于通过语法解析AST(或.y + .lang =&gt; xml)的工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆