正则表达式拆分驼峰式或标题式(高级) [英] RegEx to split camelCase or TitleCase (advanced)
问题描述
我找到了一个 出色的 RegEx 来提取camelCase 或 TitleCase 表达式的一部分.
I found a brilliant RegEx to extract the part of a camelCase or TitleCase expression.
(?<!^)(?=[A-Z])
它按预期工作:
- 价值 -> 价值
- camelValue -> 骆驼/值
- TitleValue -> 标题/价值
以 Java 为例:
String s = "loremIpsum";
words = s.split("(?<!^)(?=[A-Z])");
//words equals words = new String[]{"lorem","Ipsum"}
我的问题是它在某些情况下不起作用:
My problem is that it does not work in some cases:
- 案例 1:VALUE -> V/A/L/U/E
- 案例 2:eclipseRCPExt -> eclipse/R/C/P/Ext
在我看来,结果应该是:
To my mind, the result shoud be:
- 案例 1:VALUE
- 案例 2:eclipse/RCP/Ext
换句话说,给定 n 个大写字符:
In other words, given n uppercase chars:
- 如果 n 个字符后跟小写字符,则组应为:(n-1 个字符)/(第 n 个字符 + 小写字符)
- 如果 n 个字符在末尾,则该组应为:(n 个字符).
知道如何改进这个正则表达式吗?
Any idea on how to improve this regex?
推荐答案
以下正则表达式适用于上述所有示例:
The following regex works for all of the above examples:
public static void main(String[] args)
{
for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {
System.out.println(w);
}
}
它的工作原理是强制否定后视不仅忽略字符串开头的匹配项,而且还忽略大写字母前面是另一个大写字母的匹配项.这可以处理诸如VALUE"之类的情况.
It works by forcing the negative lookbehind to not only ignore matches at the start of the string, but to also ignore matches where a capital letter is preceded by another capital letter. This handles cases like "VALUE".
由于未能在RPC"和Ext"之间拆分,正则表达式的第一部分本身在eclipseRCPExt"上失败.这是第二个子句的目的:(?<!^)(?=[AZ][az]
.这个子句允许在每个大写字母前跟一个小写字母前进行拆分,除了在字符串的开头.
The first part of the regex on its own fails on "eclipseRCPExt" by failing to split between "RPC" and "Ext". This is the purpose of the second clause: (?<!^)(?=[A-Z][a-z]
. This clause allows a split before every capital letter that is followed by a lowercase letter, except at the start of the string.
这篇关于正则表达式拆分驼峰式或标题式(高级)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!