正则表达式拆分驼峰式或标题式(高级) [英] RegEx to split camelCase or TitleCase (advanced)

查看:51
本文介绍了正则表达式拆分驼峰式或标题式(高级)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找到了一个 出色的 RegEx 来提取camelCase 或 TitleCase 表达式的一部分.

I found a brilliant RegEx to extract the part of a camelCase or TitleCase expression.

 (?<!^)(?=[A-Z])

它按预期工作:

  • 价值 -> 价值
  • camelValue -> 骆驼/值
  • TitleValue -> 标题/价值

以 Java 为例:

String s = "loremIpsum";
words = s.split("(?<!^)(?=[A-Z])");
//words equals words = new String[]{"lorem","Ipsum"}

我的问题是它在某些情况下不起作用:

My problem is that it does not work in some cases:

  • 案例 1:VALUE -> V/A/L/U/E
  • 案例 2:eclipseRCPExt -> eclipse/R/C/P/Ext

在我看来,结果应该是:

To my mind, the result shoud be:

  • 案例 1:VALUE
  • 案例 2:eclipse/RCP/Ext

换句话说,给定 n 个大写字符:

In other words, given n uppercase chars:

  • 如果 n 个字符后跟小写字符,则组应为:(n-1 个字符)/(第 n 个字符 + 小写字符)
  • 如果 n 个字符在末尾,则该组应为:(n 个字符).

知道如何改进这个正则表达式吗?

Any idea on how to improve this regex?

推荐答案

以下正则表达式适用于上述所有示例:

The following regex works for all of the above examples:

public static void main(String[] args)
{
    for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {
        System.out.println(w);
    }
}   

它的工作原理是强制否定后视不仅忽​​略字符串开头的匹配项,而且还忽略大写字母前面是另一个大写字母的匹配项.这可以处理诸如VALUE"之类的情况.

It works by forcing the negative lookbehind to not only ignore matches at the start of the string, but to also ignore matches where a capital letter is preceded by another capital letter. This handles cases like "VALUE".

由于未能在RPC"和Ext"之间拆分,正则表达式的第一部分本身在eclipseRCPExt"上失败.这是第二个子句的目的:(?<!^)(?=[AZ][az].这个子句允许在每个大写字母前跟一个小写字母前进行拆分,除了在字符串的开头.

The first part of the regex on its own fails on "eclipseRCPExt" by failing to split between "RPC" and "Ext". This is the purpose of the second clause: (?<!^)(?=[A-Z][a-z]. This clause allows a split before every capital letter that is followed by a lowercase letter, except at the start of the string.

这篇关于正则表达式拆分驼峰式或标题式(高级)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆