可以在化学式中解析嵌套的括号吗? [英] Can nested parentheticals be parsed in chemical formulae?

查看:157
本文介绍了可以在化学式中解析嵌套的括号吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为简单的化学公式创建一个解析器.意思是,它们没有物质,电荷或类似状态.公式中只有代表化合物,数量和括号的字符串.

I am trying to create a parser for simple chemical formulae. Meaning, they have no states of matter, charge, or anything like that. The formulae only have strings representing compounds, quantities, and parentheses.

在回答类似问题之后,以及一些离散数学的基本知识,我希望我能写一篇简单的递归下降解析器即可生成公式中每个原子的数量.我已经有了一个真正简单的答案,该问题涉及单个括号,但不包含嵌套括号.

Following this answer to a similar question, and some rudimentary knowledge of discrete math, I hoped that I could write a simple Recursive Descent Parser to generate the number of each atom inside of the formula. I already have a really simple answer for this that involves single parentheses, but not nested parentheses.

这是不带括号的语法的产物:

Here are the productions of the grammar without parentheses:

Compound:  Component { Component };
Component: Atom [Quantity] 
Atom: 'H' | 'He' | 'Li' | 'Be' ...
Quantity: Digit { Digit }
Digit: '0' | '1' | ... '9'

  • [...]被视为是可选的,并且将成为程序中的if测试(它存在还是缺少)
  • |是替代项,if .. else if .. else或switch'test'也是替代项,它表示输入必须匹配其中之一
  • { ... }被读取为0或更大的重复,并且在程序中将是while循环
  • 引号之间的字符是文字​​字符,将出现在字符串中.所有其他单词都是规则的名称,对于递归下降解析器,最终是被调用以切分并处理输入的函数的名称.
    • [...] is read as optional, and will be an if test in the program (either it is there or missing)
    • | is alternatives, and so is an if .. else if .. else or switch 'test', it is saying the input must match one of these
    • { ... } is read as repetition of 0 or more, and will be a while loop in the program
    • Characters between quotes are literal characters which will be in the string. All the other words are names of rules, and for a recursive descent parser, end up being the names of the functions which get called to chop up, and handle the input.
    • 使用嵌套括号,我不知道该怎么办.嵌套括号是指(Fe2(OH)2(H2O)8)2之类的东西,或者是虚构且复杂的诸如(Ab(CD2(Ef(G2H)3)(IJ2)4)3)2

      With nested parentheses, I have no idea what to do. By nested parentheses I mean something like (Fe2(OH)2(H2O)8)2, or something fictitious and complicated like (Ab(CD2(Ef(G2H)3)(IJ2)4)3)2

      因为现在有些作品我不太了解如何表达,但这是我的最佳尝试:

      Because now there is a production that I don't really understand how to articulate, but here is my best attempt:

      Parenthetical:  Compound { Parenthetical } [Quantity]
      

      推荐答案

      因此,基本规则可以解析任何没有括号的简单化学符号和数量序列.

      So the basic rules parse any simple sequence of chemical symbols and quantities without parenthesis.

      我假设数量"定义了'('...')'之间的全部材料的数量

      I assume the Quantity is defining the quantity of the whole chunk of stuff between '(' ... ')'

      因此,'(' ... ') [Quantity]需要与组件完全相同地进行解析,即解析为>

      So, '(' ... ') [Quantity] needs to be parsed as exactly the same thing as the Component, i.e. as an alternative to: Atom [Quantity]

      因此,唯一需要更改的是Component规则;它变成:

      So the only thing to change is the Component rule; it becomes:

      Component: Atom [Quantity] | '(' Compound ')' [Quantity]

      在解析Component的代码函数(或过程)中,它将查看下一个字符(令牌),如果它是'(',它将使用它,然后调用函数(或过程) )负责解析Compound,然后检查下一个字符(令牌)是否为')'(如果不是,则为语法错误),然后处理可选的Quantity,然后完成操作.

      In the code function (or procedure) which is parsing Component, it will have a look at the next character (token), and if it is an '(', it will consume it, then call the function (or procedure) responsible for parsing Compound, and after that, check the next character (token) is a ')' (if not, it's a syntax error), then handle the optional Quantity, and then it is finished.

      我假设您使用的编程语言支持递归函数(或过程)调用.通过后台程序中的代码完成的这种内务处理,将使这种正常工作"(TM)成为可能.

      I am assuming you are using a programming language which supports recursive function (or procedure) calls. That housekeeping, done by code behind the scenes for your program, will make this 'just work' (TM).

      或者,您可以用其他方式解决问题.添加一条新规则,内容为:

      Alternatively, you could solve the problem in a different way. Add a new rule, which says:

      Stuff: Atom | '(' Compound ')'

      然后修改规则:

      Compound: Stuff [Quantity]

      然后为Stuff编写一个新函数(或过程),并更改Compound代码以简单地调用Stuff,然后处理可选的Quantity.

      Then write a new function (or procedure) for Stuff, and change the Compound code to simply call Stuff, then handle the optional Quantity.

      这样做有很好的技术理由,以支持某些解析技术.但是,您正在使用递归下降实际上并不重要.

      There are good technical reasons for doing this to support some parsing technology. However you're using recursive descent where it won't really matter.


      对于递归的体面解析器非常有效的语法类型称为LL(1),这意味着从左到右进行解析,并创建最左侧的派生.当代码和函数调用 是控制流时,这是一种自然"的解析方式.要查找有关如何检查LL(1)语法的理论,请在网上搜索解析LL(1)"或语法跟随集".


      The type of grammar which works very well for a recursive decent parser is called LL(1), which means parse from left-to-right, and create the left-most derivation. That is a 'natural' way to parse when the code and function calls is the control flow. To find the theory of how to check grammars are LL(1) search the web for "parsing LL(1)" or "grammar follow sets".

      这篇关于可以在化学式中解析嵌套的括号吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆