从化学式中提取数字 [英] Extract numbers from chemical formula

查看:109
本文介绍了从化学式中提取数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉,如果已经提出并回答了这个问题,但我找不到满意的答案.

Apologies if this has already been asked and answered but I couldn't find a satisfactory answer.

我有一个化学式的列表,包括以下顺序:C,H,N和O.我想在每个字母后面加上数字.问题在于,并非所有的公式都包含N.但是,所有的公式都包含C,H和O.并且该数字可以是一位,两位或三位数字(仅在H的情况下).

I have a list of chemical formulas including, in this order: C, H, N and O. And I would like to pull the number after each of these letters. The problem is that not all the formulas contain an N. All contain a C, H and O however. And the number can be either single, double or (in the case of H only) triple digit.

因此数据如下所示:

  • C20H37N1O5
  • C10H12O3
  • C20H19N3O4
  • C23H40O3
  • C9H13N1O3
  • C14H26O4
  • C58H100N2O9

我想要列表中每个元素的编号在单独的列中.因此,在第一个示例中将是:

I'd like each element number for the list in separate columns. So in the first example it would be:

20 37 1 5

我一直在尝试:

=IFERROR(MID(LEFT(A2,FIND("H",A2)-1),FIND("C",A2)+1,LEN(A2)),"") 

分离出C#.但是,在此之后,由于H#两侧是O或N,我被卡住了.

to separate out the C#. However, after this I get stuck as the H# is flanked by either an O or N.

是否存在可以执行此操作的excel公式或VBA?

Is there an excel formula or VBA that can do this?

推荐答案

使用正则表达式

对于正则表达式(正则表达式),这是一项很好的任务.由于VBA不支持开箱即用的正则表达式,因此我们需要先引用Windows库.

Use Regular Expressions

This is a good task for regular expressions (regex). Because VBA doesn't support regular expressions out of the box we need to reference a Windows library first.

  1. 工具下添加对正则表达式的引用,然后在参考

  1. Add reference to regex under Tools then References

,然后选择 Microsoft VBScript正则表达式5.5

将此功能添加到模块中

Option Explicit 

Public Function ChemRegex(ChemFormula As String, Element As String) As Long
    Dim strPattern As String
    strPattern = "([CNHO])([0-9]*)" 
                 'this pattern is limited to the elements C, N, H and O only.
    Dim regEx As New RegExp

    Dim Matches As MatchCollection, m As Match

    If strPattern <> "" Then
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        Set Matches = regEx.Execute(ChemFormula)
        For Each m In Matches
            If m.SubMatches(0) = Element Then
                ChemRegex = IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1) 
                            'this IIF ensures that in CH4O the C and O are count as 1
                Exit For
            End If
        Next m
    End If
End Function

  • 在单元格公式中使用像这样的函数

  • Use the function like this in a cell formula

    例如在单元格B2中:=ChemRegex($A2,B$1)并将其复制到其他单元格中

    E.g. in cell B2: =ChemRegex($A2,B$1) and copy it to the other cells


    也可以识别多次出现元素CH3OHCH2COOH

    的化学式

    请注意,上面的代码无法计数像CH3OH这样的元素在其中多次出现的情况.然后只有第一个H3被计数,最后一个被忽略.


    Recognize also chemical formulas with multiple occurrences of elements like CH3OH or CH2COOH

    Note that the code above cannot count something like CH3OH where elements occur more than once. Then only the first H3 is count the last is omitted.

    如果您还需要识别格式为CH3OHCH2COOH的公式(并汇总元素的出现),则还需要更改代码以识别它们……

    If you need also to recognize formulas in the format like CH3OH or CH2COOH (and summarize the occurrences of the elements) then you need to change the code to recognize these too …

    If m.SubMatches(0) = Element Then
        ChemRegex = ChemRegex + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        'Exit For needs to be removed.
    End If
    

    除了上面针对多次出现的元素所做的更改之外,请使用以下模式:

    In addition to the change above for multiple occurrences of elements use this pattern:

    strPattern = "([A-Z][a-z]?)([0-9]*)"   'https://regex101.com/r/nNv8W6/2
    

    1. 请注意,它们必须使用正确的大写/小写字母. CaCl2有效,但cacl2CACL2无效.
    2. 请注意,这不能证明这些字母组合是否是元素周期表中的现有元素.因此,这也将识别例如. Xx2Zz5Q作为虚拟元素Xx = 2Zz = 5Q = 1.

    1. Note that they need to be in the correct upper/lower letter case. CaCl2 works but not cacl2 or CACL2.
    2. Note that this doesn't proof if these letter combinations are existing elements of the periodic table. So this will also recognize eg. Xx2Zz5Q as fictive elements Xx = 2, Zz = 5 and Q = 1.

    要仅接受元素周期表中存在的组合,请使用以下模式:

    To accept only combinations that exist in the periodic table use the following pattern:

    strPattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    'https://regex101.com/r/Hlzta2/3
    'This pattern includes all 118 elements up to today. 
    'If new elements are found/generated by scientist they need to be added to the pattern.
    

  • 这篇关于从化学式中提取数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆