使用正则表达式将编号列表数组拆分为编号列表多行 [英] Use Regex to Split Numbered List array into Numbered List Multiline

查看:56
本文介绍了使用正则表达式将编号列表数组拆分为编号列表多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习正则表达式以回答有关葡萄牙语的问题.

输入(单元格上的数组或字符串,所以是.MultiLine = False)?

 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With number 0n mid. 4. Number 9 incorrect. 11.12 More than one digit. 12.7 Ending (no word).

输出

 1 One without dot.
 2. Some Random String.
 3.1 With SubItens.
 3.2 With number 0n mid.
 4. Number 9 incorrect.
 11.12 More than one digit.
 12.7 Ending (no word).

我当时想使用

因此,请阅读 this . RegExr网站与输入中的表达式/([0-9]{1,2})([.]{0,1})([0-9]{0,2})/igm一起使用.

并获得以下信息:

是否有更好的方法可以做到这一点?正则表达式是正确的还是更好的生成方式?我在Google上找到的示例并没有使我了解如何正确使用RegEx和Split.

也许我对分割函数的逻辑感到困惑,我想获取分割索引,而分隔符字符串是正则表达式.

解决方案

我可以使它以单词和句点结尾

使用

\d+(?:\.\d+)*[\s\S]*?\w+\.

请参见 regex演示.

详细信息

  • \d+-1个或更多数字
  • (?:\.\d+)*-零个或多个序列:
    • \.-点
    • \d+-1个或更多数字
  • [\s\S]*?-尽可能少的0个字符,直到第一个...
  • \w+\.-1个以上的字符字符,后跟..

这是示例VBA代码:

Dim str As String
Dim objMatches As Object
str = " 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With Another SubItem. 4. List item. 11.12 More than one digit."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\w+\."
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
  For Each m In objMatches
      Debug.Print m.Value
  Next
End If

注意

您可能要求匹配项仅在单词+ .处停止,后跟0+空格和使用 \d+(?:\.\d+)*[\s\S]*?[a-zA-Z]+\.(?=\s*(?:\d+|$)) .

(?=\s*(?:\d+|$))正向超前查询要求在当前位置的右边紧随其后的是0+个空格(\s*)和1+个数字(\d+)或字符串末尾($). /p>

I am trying to learn Regex to answer a question on SO portuguese.

Input (Array or String on a Cell, so .MultiLine = False)?

 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With number 0n mid. 4. Number 9 incorrect. 11.12 More than one digit. 12.7 Ending (no word).

Output

 1 One without dot.
 2. Some Random String.
 3.1 With SubItens.
 3.2 With number 0n mid.
 4. Number 9 incorrect.
 11.12 More than one digit.
 12.7 Ending (no word).

What i thought was to use Regex with Split, but i wasn't able to implement the example on Excel.

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim input As String = "plum-pear"
      Dim pattern As String = "(-)" 

      Dim substrings() As String = Regex.Split(input, pattern)    ' Split on hyphens.
      For Each match As String In substrings
         Console.WriteLine("'{0}'", match)
      Next
   End Sub
End Module
' The method writes the following to the console:
'    'plum'
'    '-'
'    'pear' 

So reading this and this. The RegExr Website was used with the expression /([0-9]{1,2})([.]{0,1})([0-9]{0,2})/igm on the Input.

And the following is obtained:

Is there a better way to make this? Is the Regex Correct or a better way to generate? The examples that i found on google didn't enlight me on how to use RegEx with Split correctly.

Maybe I am confusing with the logic of Split Function, which i wanted to get the split index and the separator string was the regex.

解决方案

I can make that it ends with word and period

Use

\d+(?:\.\d+)*[\s\S]*?\w+\.

See the regex demo.

Details

  • \d+ - 1 or more digits
  • (?:\.\d+)* - zero or more sequences of:
    • \. - dot
    • \d+ - 1 or more digits
  • [\s\S]*? - any 0+ chars, as few as possible, up to the first...
  • \w+\. - 1+ word chars followed with ..

Here is a sample VBA code:

Dim str As String
Dim objMatches As Object
str = " 1 One without dot. 2. Some Random String. 3.1 With SubItens. 3.2 With Another SubItem. 4. List item. 11.12 More than one digit."
Set objRegExp = New regexp ' CreateObject("VBScript.RegExp")
objRegExp.Pattern = "\d+(?:\.\d+)*[\s\S]*?\w+\."
objRegExp.Global = True
Set objMatches = objRegExp.Execute(str)
If objMatches.Count <> 0 Then
  For Each m In objMatches
      Debug.Print m.Value
  Next
End If

NOTE

You may require the matches to only stop at the word + . that are followed with 0+ whitespaces and a number using \d+(?:\.\d+)*[\s\S]*?[a-zA-Z]+\.(?=\s*(?:\d+|$)).

The (?=\s*(?:\d+|$)) positive lookahead requires the presence of 0+ whitespaces (\s*) followed with 1+ digits (\d+) or end of string ($) immediately to the right of the current location.

这篇关于使用正则表达式将编号列表数组拆分为编号列表多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆