是否有正则表达式方法可以用另一组字符替换一组字符(如 shell tr 命令)? [英] Is there a regular expression way to replace a set of characters with another set (like shell tr command)?
问题描述
shell tr
命令支持将一组字符替换为另一组字符放.例如,echo hello |tr [a-z] [A-Z]
会将 hello
翻译成 HELLO
.
The shell tr
command support replace one set of characters with another set.
For example, echo hello | tr [a-z] [A-Z]
will tranlate hello
to HELLO
.
然而,在java中,我必须像下面这样单独替换每个字符
In java, however, I must replace each character individually like the following
"10 Dogs Are Racing"
.replaceAll ("0", "0")
.replaceAll ("1", "1")
.replaceAll ("2", "2")
// ...
.replaceAll ("9", "9")
.replaceAll ("A", "A")
// ...
;
apache-commons-lang 库提供了一个方便的replaceChars
方法来做这样的替换.
The apache-commons-lang library provides a convenient replaceChars
method to do such replacement.
// half-width to full-width
System.out.println
(
org.apache.commons.lang.StringUtils.replaceChars
(
"10 Dogs Are Racing",
"0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
"0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
)
);
// Result:
// 10 Dogs Are Racing
但是如你所见,有时searchChars/replaceChars太长(也太无聊了,如果你想找一个重复的字符在里面),可以用一个简单的正则表达式表示[0-9A-Za-z]
/[0-9A-Za-z]
.有没有正则表达式的方式来实现这一点?
But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]
/[0-9A-Za-z]
. Is there a regular expression way to achieve that ?
推荐答案
虽然没有直接的方法可以做到这一点,但构建您自己的实用函数以与 replaceChars
结合使用相对简单.下面的版本接受简单的字符类,没有[
或]
;它不做类否定([^a-z]
).
While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars
is relatively simple. The version below accepts simple character classes, without [
or ]
; it does not do class negation ([^a-z]
).
对于您的用例,您可以:
For your use case, you could do:
StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))
代码:
public static String charRange(String str) {
StringBuilder ret = new StringBuilder();
char ch;
for(int index = 0; index < str.length(); index++) {
ch = str.charAt(index);
if(ch == '\') {
if(index + 1 >= str.length()) {
throw new PatternSyntaxException(
"Malformed escape sequence.", str, index
);
}
// special case for escape character, consume next char:
index++;
ch = str.charAt(index);
}
if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {
// this was a single char, or the last char in the string
ret.append(ch);
} else {
if(index + 2 >= str.length()) {
throw new PatternSyntaxException(
"Malformed character range.", str, index + 1
);
}
// this char was the beginning of a range
for(char r = ch; r <= str.charAt(index + 2); r++) {
ret.append(r);
}
index = index + 2;
}
}
return ret.toString();
}
产生:
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
这篇关于是否有正则表达式方法可以用另一组字符替换一组字符(如 shell tr 命令)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!