如何使用VBA中的RegExp隔离空间(\s vs. \p {Zs})? [英] How do I isolate a space using RegExp in VBA (\s vs. \p{Zs})?

查看:583
本文介绍了如何使用VBA中的RegExp隔离空间(\s vs. \p {Zs})?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介/问题:

我一直在研究使用正则表达式(使用VBA / Excel),到目前为止,我不能了解我如何使用来自其他空白字符的正则表达式隔离< space> (或)包含在 \s 中。我以为我可以使用 \p {Zs} ,但是到目前为止,我的测试还没有解决。有人可以纠正我的误会吗?我感谢任何有用的信息。

I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space> (or " ") using regexp from other white space characters that are included in \s. I thought that I would be able to use \p{Zs}, but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input.

为了提供适当的信用,我修改了一些代码,作为一个非常有用的帖子,由@Portland Runner在这里找到:如何在单元格和循环中的Microsoft Excel中使用正则表达式(正则表达式)

To offer proper credit, I modified some code that started off as a very helpful post by @Portland Runner that is found here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

迄今为止,我的方法/学习: / strong>

This has been my approach/study so far:

使用字符串14z-16z调味花生,我一直在试图写一个RegExp将删除14z-16z,只留下调味花生。我最初使用 ^ [0-9](\S)+ 作为strPattern和具有以下代码段的子过程:

Using the string "14z-16z Flavored Peanuts", I've been trying to write a RegExp which removes "14z-16z " and leaves only "Flavored Peanuts". I initially used ^[0-9](\S)+ as strPattern and a sub procedure with following snippets:

Sub REGEXP_TEST_SPACE()

Dim strPattern As String
Dim strReplace As String
Dim strInput As String
Dim regEx As New RegExp

strInput = "14z-16z Flavored Peanuts"
strPattern = "^[0-9](\S)+"
strReplace = ""

With regEx
    .Global = True
    .MultiLine = True
    .IgnoreCase = True
    .pattern = strPattern
End With

If regEx.Test(strInput) Then
    Range("A1").Value = regEx.Replace(strInput, strReplace)
End If

End Sub

这种方法给了我一个A1值调味花生 (注意该字符串中的前导< space>

This approach gave me an A1 value of " Flavored Peanuts" (note the leading <space> in that string).

然后我更改了 strPattern =^ [0-9](\S)+(\s)(添加(\ s)),这给了我所需的A1价值e 调味花生。大!!!我有理想的输出!

I then changed strPattern = "^[0-9](\S)+(\s)" (added the (\s)), which gave me the desired A1 value of "Flavored Peanuts". Great!!! I got the desired output!

但是据我了解, \s 表示所有的空格字符,等于 [\f\\\
\r\t\v]
。在这种情况下,我知道字符只是一个正常的单个空格 - 我不需要回车,水平标签等。所以我试图看看我是否可以隔离正则表达式(unicode separator:space)中的空格> 字符,我相信是 \p {Zs} (例如 strPattern =^ [0-9](\S)+(\p {Zs}))。然而,使用此模式不会返回匹配,从而不会删除前导空间。我也尝试了更一般的 \p {Z} (所有unicode分隔符),但也没有工作。

But as I understand it, \s represents all white-space characters, equal to [ \f\n\r\t\v]. In this case, I know that the character is just a normal, single space -- I don't need carriage return, horizontal tab, etc. So I tried to see if I could just isolate the <space> character in regex (unicode separator: space), which I believe is \p{Zs} (e.g., strPattern = "^[0-9](\S)+(\p{Zs})"). Using this pattern, however, doesn't return a match whatsoever, nevermind removing the leading space. I also tried the more general \p{Z} (all unicode separators), but that didn't work either.

显然我错过了我的学习中的一些东西。帮助是希望和赞赏。谢谢。

Clearly I have missed something in my study. Help is both desired and appreciated. Thank you.

推荐答案

由于您正在尝试找到C#-way正则表达式字符类\p {Zs}之间的对应关系,你可能也想处理所有的硬空间。这个代码将会很有用:

Since you are trying to find a correspondence between C#-way regex character class \p{Zs}, you might want to also handle all hard spaces. This code will be helpful:

strPattern = "^[0-9](\S)+[ " & ChrW(160) & "]"

如果您需要匹配各种空格,您可以根据 https://www.cs的信息使用此正则表达式模式.tut.fi /〜jkorpela / chars / spaces.html

If you need to match all kinds of spaces, you can use this regex pattern taken based on the information on https://www.cs.tut.fi/~jkorpela/chars/spaces.html:

strPattern =^ [0-9](\\ \\ S)+ [& CharW(160)+ CharW(5760)+ CharW(6158)+ CharW(8192)+ CharW(8193)+ CharW(8194)+ CharW(8195)+ CharW(8196)+ CharW(8197)+ CharW(8198)+ CharW(8199)+ CharW(8200)+ CharW(8201)+ CharW(8202)+ CharW(8203)+ CharW(8239)+ CharW(8287)+ CharW(12288)+ CharW(65279) ]

这是带代码点说明的表:

This is the table with code point explanations:

U+0020  32  SPACE   foo bar Depends on font, typically 1/4 em, often adjusted
U+00A0  160 NO-BREAK SPACE  foo bar As a space, but often not adjusted
U+1680  5760    OGHAM SPACE MARK    foo bar Unspecified; usually not really a space but a dash
U+180E  6158    MONGOLIAN VOWEL SEPARATOR   foo᠎bar No width
U+2000  8192    EN QUAD foo bar 1 en (= 1/2 em)
U+2001  8193    EM QUAD foo bar 1 em (nominally, the height of the font)
U+2002  8194    EN SPACE    foo bar 1 en (= 1/2 em)
U+2003  8195    EM SPACE    foo bar 1 em
U+2004  8196    THREE-PER-EM SPACE  foo bar 1/3 em
U+2005  8197    FOUR-PER-EM SPACE   foo bar 1/4 em
U+2006  8198    SIX-PER-EM SPACE    foo bar 1/6 em
U+2007  8199    FIGURE SPACE    foo bar "Tabular width", the width of digits
U+2008  8200    PUNCTUATION SPACE   foo bar The width of a period "."
U+2009  8201    THIN SPACE  foo bar 1/5 em (or sometimes 1/6 em)
U+200A  8202    HAIR SPACE  foo bar Narrower than THIN SPACE
U+200B  8203    ZERO WIDTH SPACE    foo​bar Nominally no width, but may expand
U+202F  8239    NARROW NO-BREAK SPACE   foo bar Narrower than NO-BREAK SPACE (or SPACE)
U+205F  8287    MEDIUM MATHEMATICAL SPACE   foo bar 4/18 em
U+3000  12288   IDEOGRAPHIC SPACE   foo bar The width of ideographic (CJK) characters.
U+FEFF  65279   ZERO WIDTH NO-BREAK SPACE

祝贺。

这篇关于如何使用VBA中的RegExp隔离空间(\s vs. \p {Zs})?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆