模式匹配在excel [英] Pattern matching in excel

查看:189
本文介绍了模式匹配在excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个excel表与两列。第一列是关键短语,第二列是消息。关键短语可能发生在消息列中。我需要知道消息列中发生了一个关键短语的次数。



关键词是一列,消息是第二列。消息列是1个或多于1个关键短语的组合(连接)。我需要知道每个消息包含多少关键短语。还有一些消息与他们有一些日期和数字。还有一些消息中有日期和数字,目前与(xx-xxx-xxxx)作为该日期/数字的匹配关键短语。



例如消息是该交易于2014年10月8日结束,因此不允许进一步交易,关键短语是关闭(xx-xxx-xxxx) 。还有消息是交易号码4238428DDSSD有问题,关键字短语是交易号码xxxxxxxx hass问题。需要正则表达式匹配。

解决方案

您可以选择一些关键字短语,为其创建正则表达式模式,然后对短语进行编码,使的regex关键字



随意修改附加附加关键字,短语和RegEx模式。


I have an excel sheet with two columns. The first column is the key phrase and the second is the messages. The key phrase may occur in the messages column. I need to know how many times a key phrase has occurred in messages column.

The key phrase is one column and the messages is the second column. The messages column is combination (concatenation) of 1 or more than 1 key phrases. I need to find out that how many key phrases does each message contain. Also some of the messages have some dates and numbers with them. Also some of the messages have dates and numbers in them, the matching key phrase as that date/numbe as (xx-xxx-xxxx) presently.

e.g. the message is "The deal closed on 08-Oct-2014 so no further transaction allowed" and the key phrase is "The deal closed on (xx-xxx-xxxx)". Also there are messages as "Deal number 4238428DDSSD has problems" and the keyphrase is "Deal number xxxxxxxx hass problems". The regex matching is required.

解决方案

You can pick a few keyword phrases, create the regex pattern for them, then encode the phrases such that a Range.Replace method can be used on them to substitute an appropriate RegEx pattern mask into the keyword phrase.

In the following, I've used X00000000X, XSHORTDATEX and XDEALNMBRX as placeholders within the keywords. These will be replaced with [0-9,-]{7,8}, [0-9,-]{3}[a-z]{3}[0-9,-]{3,5} and [0-9]{7}[a-z]{5} respectively.

X00000000X is designed to handle anything that looks like 1234567 or * 99-11-00*. XSHORTDATEX will handle dates in the dd-mmm-yy or dd-mmm-yyyy format (once converted to lower case) and XDEALNMBRX will locate alphanumeric patterns similar to 4238428DDSSD.

This code requires that the Microsoft VBScript Regular Expression library be added to the VBA project with the VBE's Tools ► References command.

Sub count_strings_inside_strings_rgx()
    Dim rw As Long, lr As Long
    Dim k As Long, p As Long, vKEYs As Variant, vPHRASEs As Variant, vCOUNTs As Variant
    Dim sPATTERN As String, vbaRGX As New RegExp, cMATCHES As MatchCollection

    ReDim vKEYs(0)
    ReDim vPHRASEs(0)

    With Worksheets("Sheet1")   '<~~ set to the correct worksheet name\
        'populate the vKEYs array
        For rw = 2 To .Cells(Rows.Count, 1).End(xlUp).Row
            vKEYs(UBound(vKEYs)) = LCase(.Cells(rw, 1).Value2)
            ReDim Preserve vKEYs(UBound(vKEYs) + 1)
        Next rw
        ReDim Preserve vKEYs(UBound(vKEYs) - 1)

        'populate the vPHRASEs array
        For rw = 2 To .Cells(Rows.Count, 2).End(xlUp).Row
            vPHRASEs(UBound(vPHRASEs)) = LCase(.Cells(rw, 2).Value2)
            ReDim Preserve vPHRASEs(UBound(vPHRASEs) + 1)
        Next rw
        ReDim Preserve vPHRASEs(UBound(vPHRASEs) - 1)
        ReDim vCOUNTs(0 To UBound(vPHRASEs))

        For p = LBound(vPHRASEs) To UBound(vPHRASEs)
            For k = LBound(vKEYs) To UBound(vKEYs)
                sPATTERN = Replace(vKEYs(k), "x00000000x", "[0-9,\-]{7,8}")
                sPATTERN = Replace(sPATTERN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
                sPATTERN = Replace(sPATTERN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
                sPATTERN = Replace(sPATTERN, "xshortwrapdatex", "\([0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}\)")
                With vbaRGX
                    .Global = True
                    .Pattern = sPATTERN
                    Set cMATCHES = .Execute(vPHRASEs(p))
                End With
                vCOUNTs(p) = vCOUNTs(p) + cMATCHES.Count
            Next k
        Next p

        .Cells(2, 3).Resize(UBound(vCOUNTs) + 1, 1) = Application.Transpose(vCOUNTs)

        Call key_in_phrase_helper_rgx(vKEYs, .Range(.Cells(2, 2), .Cells(Rows.Count, 2).End(xlUp)))

    End With

    Set cMATCHES = Nothing
    Set vbaRGX = Nothing

End Sub

Sub key_in_phrase_helper_rgx(vKYs As Variant, rPHRSs As Range)
    Dim c As Long, m As Long, p As Long, r As Long, v As Long, sPTTRN As String
    Dim vbaRGX As New RegExp, cMATCHES As MatchCollection

    With rPHRSs
        For r = 1 To rPHRSs.Rows.Count
            With .Cells(r, 1)
                .ClearFormats
                For v = LBound(vKYs) To UBound(vKYs)
                    sPTTRN = Replace(vKYs(v), "x00000000x", "[0-9,\-]{7,8}")
                    sPTTRN = Replace(sPTTRN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
                    sPTTRN = Replace(sPTTRN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
                    sPTTRN = Replace(sPTTRN, "xshortwrapdatex", "\([0-9,\-]{2,3}[a-z]{3}[0-9,\-]{3,5}\)")
                    c = 5 + CBool(vKYs(v) <> sPTTRN) * 2
                    Debug.Print sPTTRN
                    With vbaRGX
                        .Global = True
                        .Pattern = sPTTRN
                    End With
                    Set cMATCHES = vbaRGX.Execute(LCase(.Value2))
                    For m = 0 To cMATCHES.Count - 1
                        p = 0
                        Do While CBool(InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare))
                            p = InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare)
                            'Debug.Print vKYs(v)
                            With .Characters(Start:=p, Length:=Len(cMATCHES.Item(m))).Font
                                .Bold = True
                                .ColorIndex = c
                            End With
                        Loop
                    Next m
                Next v
            End With
        Next r
    End With

    Set cMATCHES = Nothing
    Set vbaRGX = Nothing

End Sub

In the following image of me sample's results, the staight location items are noted in bold}blue and the RegEx pattern matching is noted by bold|red.

  

Feel free to modify and append with additional keywords, phrases and RegEx patterns.

这篇关于模式匹配在excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆