如何在单元格内和循环中使用 Microsoft Excel 中的正则表达式 (Regex) [英] How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

查看:17
本文介绍了如何在单元格内和循环中使用 Microsoft Excel 中的正则表达式 (Regex)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 Excel 中使用正则表达式并利用 Excel 强大的类似网格的设置进行数据操作?

How can I use regular expressions in Excel and take advantage of Excel's powerful grid-like setup for data manipulation?

  • 单元格内函数返回字符串中匹配的模式或替换值.
  • Sub 循环遍历一列数据并提取与相邻单元格的匹配项.
  • 需要什么设置?
  • Excel 的正则表达式特殊字符是什么?

我知道 Regex 在许多情况下并不理想(使用或不使用正则表达式?),因为 excel 可以使用 LeftMidRightInstr 类型的命令进行类似的操作.

I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.

推荐答案

正则表达式被使用用于模式匹配.

要在 Excel 中使用,请按照下列步骤操作:

To use in Excel follow these steps:

步骤 1:添加对Microsoft VBScript 正则表达式 5.5"的 VBA 引用

Step 1: Add VBA reference to "Microsoft VBScript Regular Expressions 5.5"

  • 选择开发者"选项卡(我没有这个选项卡我该怎么办?)
  • 选择Visual Basic"来自代码"功能区部分的图标
  • 在Microsoft Visual Basic for Applications"中窗口选择工具"从顶部菜单.
  • 选择参考"
  • 选中Microsoft VBScript 正则表达式 5.5"旁边的框;包含在您的工作簿中.
  • 点击确定"

第 2 步:定义您的模式

基本定义:

- 范围.

  • 例如a-z 匹配从 a 到 z 的小写字母
  • 例如0-5 匹配从 0 到 5 的任何数字
  • E.g. a-z matches an lower case letters from a to z
  • E.g. 0-5 matches any number from 0 to 5

[] 完全匹配这些括号内的对象之一.

[] Match exactly one of the objects inside these brackets.

  • 例如[a] 匹配字母 a
  • 例如[abc] 匹配单个字母,可以是 a、b 或 c
  • 例如[a-z] 匹配字母表中的任何单个小写字母.
  • E.g. [a] matches the letter a
  • E.g. [abc] matches a single letter which can be a, b or c
  • E.g. [a-z] matches any single lower case letter of the alphabet.

() 将不同的匹配分组以用于返回目的.请参阅下面的示例.

() Groups different matches for return purposes. See examples below.

{} 之前定义的模式重复副本的乘数.

{} Multiplier for repeated copies of pattern defined before it.

  • 例如[a]{2} 匹配两个连续的小写字母 a:aa
  • 例如[a]{1,3} 匹配至少一个和最多三个小写字母 aaaaaa
  • E.g. [a]{2} matches two consecutive lower case letter a: aa
  • E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa

+ 匹配至少一个或多个在它之前定义的模式.

+ Match at least one, or more, of the pattern defined before it.

  • 例如a+ 将匹配连续的 a 的 aaaaaa 等等
  • E.g. a+ will match consecutive a's a, aa, aaa, and so on

? 匹配零个或前面定义的模式之一.

? Match zero or one of the pattern defined before it.

  • 例如模式可能存在也可能不存在,但只能匹配一次.
  • 例如[a-z]? 匹配空字符串或任何单个小写字母.
  • E.g. Pattern may or may not be present but can only be matched one time.
  • E.g. [a-z]? matches empty string or any single lower case letter.

* 匹配零个或多个在它之前定义的模式.

* Match zero or more of the pattern defined before it.

  • 例如可能存在也可能不存在的模式的通配符.
  • 例如[a-z]* 匹配空字符串或小写字母字符串.
  • E.g. Wildcard for pattern that may or may not be present.
  • E.g. [a-z]* matches empty string or string of lower case letters.

. 匹配除换行符

. Matches any character except newline

  • 例如a. 匹配以 a 开头并以除
  • 以外的任何内容结尾的两个字符串
  • E.g. a. Matches a two character string starting with a and ending with anything except

| OR 运算符

  • 例如a|b 表示可以匹配 ab.
  • 例如red|white|orange 与其中一种颜色完全匹配.
  • E.g. a|b means either a or b can be matched.
  • E.g. red|white|orange matches exactly one of the colors.

^ NOT 运算符

  • 例如[^0-9] 字符不能包含数字
  • 例如[^aA] 字符不能为小写a 或大写A
  • E.g. [^0-9] character can not contain a number
  • E.g. [^aA] character can not be lower case a or upper case A

转义后面的特殊字符(覆盖上述行为)

Escapes special character that follows (overrides above behavior)

  • 例如., \, (, ?, $,^
  • E.g. ., \, (, ?, $, ^

锚定模式:

^ 匹配必须出现在字符串的开头

^ Match must occur at start of string

  • 例如^a 第一个字符必须是小写字母 a
  • 例如^[0-9] 第一个字符必须是数字.
  • E.g. ^a First character must be lower case letter a
  • E.g. ^[0-9] First character must be a number.

$ 匹配必须出现在字符串的末尾

$ Match must occur at end of string

  • 例如a$ 最后一个字符必须是小写字母 a
  • E.g. a$ Last character must be lower case letter a

优先级表:

Order  Name                Representation
1      Parentheses         ( )
2      Multipliers         ? + * {m,n} {m, n}?
3      Sequence & Anchors  abc ^ $
4      Alternation         |


预定义的字符缩写:

abr    same as       meaning
d     [0-9]         Any single digit
D     [^0-9]        Any single character that's not a digit
w     [a-zA-Z0-9_]  Any word character
W     [^a-zA-Z0-9_] Any non-word character
s     [ 
	
f]   Any space character
S     [^ 
	
f]  Any non-space character

     [
]          New line


示例 1:作为宏运行

以下示例宏查看单元格 A1 中的值,以查看前 1 或 2 个字符是否为数字.如果是这样,它们将被删除并显示字符串的其余部分.如果没有,则会出现一个框,告诉您未找到匹配项.12abc的单元格A1值将返回abc1abc的值将返回abc, abc123 的值将返回Not Matched";因为数字不在字符串的开头.

The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return "Not Matched" because the digits were not at the start of the string.

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1")
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.Test(strInput) Then
            MsgBox (regEx.Replace(strInput, strReplace))
        Else
            MsgBox ("Not matched")
        End If
    End If
End Sub


示例 2:作为内嵌函数运行

此示例与示例 1 相同,但设置为作为单元内函数运行.要使用,请将代码更改为:

This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:

Function simpleCellRegex(Myrange As Range) As String
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim strReplace As String
    Dim strOutput As String
    
    
    strPattern = "^[0-9]{1,3}"
    
    If strPattern <> "" Then
        strInput = Myrange.Value
        strReplace = ""
        
        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With
        
        If regEx.test(strInput) Then
            simpleCellRegex = regEx.Replace(strInput, strReplace)
        Else
            simpleCellRegex = "Not matched"
        End If
    End If
End Function

将您的字符串(12abc")放在单元格 A1 中.在单元格 B1 中输入此公式 =simpleCellRegex(A1),结果将为abc".

Place your strings ("12abc") in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be "abc".

示例 3:循环范围

此示例与示例 1 相同,但会遍历一系列单元格.

This example is the same as example 1 but loops through a range of cells.

Private Sub simpleRegex()
    Dim strPattern As String: strPattern = "^[0-9]{1,2}"
    Dim strReplace As String: strReplace = ""
    Dim regEx As New RegExp
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A5")
    
    For Each cell In Myrange
        If strPattern <> "" Then
            strInput = cell.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.Test(strInput) Then
                MsgBox (regEx.Replace(strInput, strReplace))
            Else
                MsgBox ("Not matched")
            End If
        End If
    Next
End Sub


示例 4:拆分不同的模式

此示例循环遍历一个范围 (A1, A2 & A3) 并查找以三位数字开头的字符串,后跟单个字母字符,然后是 4 个数字.输出使用 () 将模式匹配拆分为相邻的单元格.$1 表示在第一组 () 中匹配的第一个模式.

This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().

Private Sub splitUpRegexPattern()
    Dim regEx As New RegExp
    Dim strPattern As String
    Dim strInput As String
    Dim Myrange As Range
    
    Set Myrange = ActiveSheet.Range("A1:A3")
    
    For Each C In Myrange
        strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
        
        If strPattern <> "" Then
            strInput = C.Value
            
            With regEx
                .Global = True
                .MultiLine = True
                .IgnoreCase = False
                .Pattern = strPattern
            End With
            
            If regEx.test(strInput) Then
                C.Offset(0, 1) = regEx.Replace(strInput, "$1")
                C.Offset(0, 2) = regEx.Replace(strInput, "$2")
                C.Offset(0, 3) = regEx.Replace(strInput, "$3")
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        End If
    Next
End Sub

结果:

其他模式示例

String   Regex Pattern                  Explanation
a1aaa    [a-zA-Z][0-9][a-zA-Z]{3}       Single alpha, single digit, three alpha characters
a1aaa    [a-zA-Z]?[0-9][a-zA-Z]{3}      May or may not have preceding alpha character
a1aaa    [a-zA-Z][0-9][a-zA-Z]{0,3}     Single alpha, single digit, 0 to 3 alpha characters
a1aaa    [a-zA-Z][0-9][a-zA-Z]*         Single alpha, single digit, followed by any number of alpha characters

</i8>    </[a-zA-Z][0-9]>            Exact non-word character except any single alpha followed by any single digit

这篇关于如何在单元格内和循环中使用 Microsoft Excel 中的正则表达式 (Regex)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆