RegEx用于解析化学式 [英] RegEx for parsing chemical formulas

查看:100
本文介绍了RegEx用于解析化学式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一种将化学式分离为其成分的方法.结果应该看起来像 这个:

I need a way to separate a chemical formula into its components. The result should look like this:

   Ag3PO4 -> [Ag3, P, O4]
      H2O -> [H2, O]
   CH3OOH -> [C, H3, O, O, H]
Ca3(PO4)2 -> [Ca3, (PO4)2]

我不知道regex语法,但是我知道我需要类似的东西

I don't know regex syntax, but I know I need something like this

[可选括号] [大写字母] [0个或多个小写字母] [0个或多个数字] [一个可选括号] [0个或多个数字]

[An optional parenthesis][A capital letter][0 or more lowercase letters][0 or more numbers][An optional parenthesis][0 or more numbers]

这有效

NSRegularExpression *regex = [NSRegularExpression
                              regularExpressionWithPattern:@"[A-Z][a-z]*\\d*|\\([^)]+\\)\\d*"
                              options:0
                              error:nil];
NSArray *tests = [[NSArray alloc ] initWithObjects:@"Ca3(PO4)2", @"HCl", @"CaCO3", @"ZnCl2", @"C7H6O2", @"BaSO4", nil];
for (NSString *testString in tests)
{
    NSLog(@"Testing: %@", testString);
    NSArray *myArray = [regex matchesInString:testString options:0 range:NSMakeRange(0, [testString length])] ;
    NSMutableArray *matches = [NSMutableArray arrayWithCapacity:[myArray count]];

    for (NSTextCheckingResult *match in myArray) {
        NSRange matchRange = [match rangeAtIndex:0];
        [matches addObject:[testString substringWithRange:matchRange]];
        NSLog(@"%@", [matches lastObject]);
    }
}

推荐答案

(PO4)2真的与众不同.

让我们从没有括号的简单匹配项开始:

Let's start from simple, match items without parenthesis:

[A-Z][a-z]?\d*

使用上面的正则表达式,我们可以成功解析Ag3PO4H2OCH3OOH.

Using regex above we can successfully parse Ag3PO4, H2O, CH3OOH.

然后,我们需要以某种方式为组添加表达式.分组本身可以使用以下方式进行匹配:

Then we need to somehow add expression for group. Group by itself can be matched using:

\(.*?\)\d+

因此我们添加了or条件:

[A-Z][a-z]?\d*|\(.*?\)\d+

演示

在特定情况下适用.但也许您还有更多示例.

Which works for given cases. But may be you have some more samples.

注意:嵌套括号会出现问题.前任. Co3(Fe(CN)6)2

Note: It will have problems with nested parenthesis. Ex. Co3(Fe(CN)6)2

如果要处理这种情况,可以使用以下正则表达式:

If you want to handle that case, you can use the following regex:

[A-Z][a-z]?\d*|(?<!\([^)]*)\(.*\)\d+(?![^(]*\))

对于Objective-C,您可以使用不带环顾四周的表达式:

For Objective-C you can use the expression without lookarounds:

[A-Z][a-z]?\d*|\([^()]*(?:\(.*\))?[^()]*\)\d+

演示

或者带有重复的正则表达式(我不知道这样的公式,但是如果有A(B(CD)3E(FG)4)5之类的东西-一个括号内有多个括号.

Or regex with repetitions (I don't know such formulas, but in case if there is anything like A(B(CD)3E(FG)4)5 - multiple parenthesis blocks inside one.

[A-Z][a-z]?\d*|\((?:[^()]*(?:\(.*\))?[^()]*)+\)\d+

演示

这篇关于RegEx用于解析化学式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆