是否有正则表达式方法用另一组替换一组字符(如shell tr​​命令)? [英] Is there a regular expression way to replace a set of characters with another set (like shell tr command)?

查看:241
本文介绍了是否有正则表达式方法用另一组替换一组字符(如shell tr​​命令)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

shell tr 命令支持替换一组字符与另一组。
例如, echo hello | tr [az] [AZ] hello 转换为 HELLO

The shell tr command support replace one set of characters with another set. For example, echo hello | tr [a-z] [A-Z] will tranlate hello to HELLO.

但是,在java中,我必须单独替换每个字符,如下所示

In java, however, I must replace each character individually like the following

"10 Dogs Are Racing"
    .replaceAll ("0", "0")
    .replaceAll ("1", "1")
    .replaceAll ("2", "2")
    // ...
    .replaceAll ("9", "9")
    .replaceAll ("A", "A")
    // ...
;

apache-commons-lang 库提供了一个方便的 replaceChars 方法进行此类替换。

The apache-commons-lang library provides a convenient replaceChars method to do such replacement.

// half-width to full-width
System.out.println
(
    org.apache.commons.lang.StringUtils.replaceChars
    (
        "10 Dogs Are Racing",
        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    )
);
// Result:
// 10 Dogs Are Racing

但是当你可以看到,有时searchChars / replaceChars太长(也太无聊,如果你愿意,请在其中找到重复的字符),并且可以用简单的正则表达式表示 [0-9A-Za Z] / [0-9A-ZA-Z] 。是否有正则表达方式来实现它?

But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]/[0-9A-Za-z]. Is there a regular expression way to achieve that ?

推荐答案

虽然没有直接的方法可以做到这一点,但是构建自己的实用程序函数与 replaceChars 结合使用相对简单。下面的版本接受简单的字符类,没有 [] ;它不做类否定( [^ az] )。

While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars is relatively simple. The version below accepts simple character classes, without [ or ]; it does not do class negation ([^a-z]).

对于你的用例,你可以这样做:

For your use case, you could do:

StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))

代码:

public static String charRange(String str) {
    StringBuilder ret = new StringBuilder();
    char ch;
    for(int index = 0; index < str.length(); index++) {
        ch = str.charAt(index);
        if(ch == '\\') {
            if(index + 1 >= str.length()) {
                throw new PatternSyntaxException(
                    "Malformed escape sequence.", str, index
                );
            }
            // special case for escape character, consume next char:
            index++;
            ch = str.charAt(index);
        }
        if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {
            // this was a single char, or the last char in the string
            ret.append(ch);
        } else {
            if(index + 2 >= str.length()) {
                throw new PatternSyntaxException(
                    "Malformed character range.", str, index + 1
                );
            }
            // this char was the beginning of a range
            for(char r = ch; r <= str.charAt(index + 2); r++) {
                ret.append(r);
            }
            index = index + 2;
        }
    }
    return ret.toString();
}

产生:

0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

这篇关于是否有正则表达式方法用另一组替换一组字符(如shell tr​​命令)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆