是否有正则表达式方法可以用另一组字符替换一组字符(如 shell tr​​ 命令)? [英] Is there a regular expression way to replace a set of characters with another set (like shell tr command)?

查看:16
本文介绍了是否有正则表达式方法可以用另一组字符替换一组字符(如 shell tr​​ 命令)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

shell tr 命令支持将一组字符替换为另一组字符放.例如,echo hello |tr [a-z] [A-Z] 会将 hello 翻译成 HELLO.

The shell tr command support replace one set of characters with another set. For example, echo hello | tr [a-z] [A-Z] will tranlate hello to HELLO.

然而,在java中,我必须像下面这样单独替换每个字符

In java, however, I must replace each character individually like the following

"10 Dogs Are Racing"
    .replaceAll ("0", "0")
    .replaceAll ("1", "1")
    .replaceAll ("2", "2")
    // ...
    .replaceAll ("9", "9")
    .replaceAll ("A", "A")
    // ...
;

apache-commons-lang 库提供了一个方便的replaceChars 方法来做这样的替换.

The apache-commons-lang library provides a convenient replaceChars method to do such replacement.

// half-width to full-width
System.out.println
(
    org.apache.commons.lang.StringUtils.replaceChars
    (
        "10 Dogs Are Racing",
        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    )
);
// Result:
// 10 Dogs Are Racing

但是如你所见,有时searchChars/replaceChars太长(也太无聊了,如果你想找一个重复的字符在里面),可以用一个简单的正则表达式表示[0-9A-Za-z]/[0-9A-Za-z].有没有正则表达式的方式来实现这一点?

But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]/[0-9A-Za-z]. Is there a regular expression way to achieve that ?

推荐答案

虽然没有直接的方法可以做到这一点,但构建您自己的实用函数以与 replaceChars 结合使用相对简单.下面的版本接受简单的字符类,没有[];它不做类否定([^a-z]).

While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars is relatively simple. The version below accepts simple character classes, without [ or ]; it does not do class negation ([^a-z]).

对于您的用例,您可以:

For your use case, you could do:

StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))

代码:

public static String charRange(String str) {
    StringBuilder ret = new StringBuilder();
    char ch;
    for(int index = 0; index < str.length(); index++) {
        ch = str.charAt(index);
        if(ch == '\') {
            if(index + 1 >= str.length()) {
                throw new PatternSyntaxException(
                    "Malformed escape sequence.", str, index
                );
            }
            // special case for escape character, consume next char:
            index++;
            ch = str.charAt(index);
        }
        if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {
            // this was a single char, or the last char in the string
            ret.append(ch);
        } else {
            if(index + 2 >= str.length()) {
                throw new PatternSyntaxException(
                    "Malformed character range.", str, index + 1
                );
            }
            // this char was the beginning of a range
            for(char r = ch; r <= str.charAt(index + 2); r++) {
                ret.append(r);
            }
            index = index + 2;
        }
    }
    return ret.toString();
}

产生:

0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

这篇关于是否有正则表达式方法可以用另一组字符替换一组字符(如 shell tr​​ 命令)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆