是否有正则表达式方法用另一组替换一组字符(如shell tr命令)? [英] Is there a regular expression way to replace a set of characters with another set (like shell tr command)?
问题描述
shell tr
命令支持替换一组字符与另一组。
例如, echo hello | tr [az] [AZ]
将 hello
转换为 HELLO
。
The shell tr
command support replace one set of characters with another set.
For example, echo hello | tr [a-z] [A-Z]
will tranlate hello
to HELLO
.
但是,在java中,我必须单独替换每个字符,如下所示
In java, however, I must replace each character individually like the following
"10 Dogs Are Racing"
.replaceAll ("0", "0")
.replaceAll ("1", "1")
.replaceAll ("2", "2")
// ...
.replaceAll ("9", "9")
.replaceAll ("A", "A")
// ...
;
apache-commons-lang 库提供了一个方便的 replaceChars
方法进行此类替换。
The apache-commons-lang library provides a convenient replaceChars
method to do such replacement.
// half-width to full-width
System.out.println
(
org.apache.commons.lang.StringUtils.replaceChars
(
"10 Dogs Are Racing",
"0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
"0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
)
);
// Result:
// 10 Dogs Are Racing
但是当你可以看到,有时searchChars / replaceChars太长(也太无聊,如果你愿意,请在其中找到重复的字符),并且可以用简单的正则表达式表示 [0-9A-Za Z]
/ [0-9A-ZA-Z]
。是否有正则表达方式来实现它?
But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]
/[0-9A-Za-z]
. Is there a regular expression way to achieve that ?
推荐答案
虽然没有直接的方法可以做到这一点,但是构建自己的实用程序函数与 replaceChars
结合使用相对简单。下面的版本接受简单的字符类,没有 [
或]
;它不做类否定( [^ az]
)。
While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars
is relatively simple. The version below accepts simple character classes, without [
or ]
; it does not do class negation ([^a-z]
).
对于你的用例,你可以这样做:
For your use case, you could do:
StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))
代码:
public static String charRange(String str) {
StringBuilder ret = new StringBuilder();
char ch;
for(int index = 0; index < str.length(); index++) {
ch = str.charAt(index);
if(ch == '\\') {
if(index + 1 >= str.length()) {
throw new PatternSyntaxException(
"Malformed escape sequence.", str, index
);
}
// special case for escape character, consume next char:
index++;
ch = str.charAt(index);
}
if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {
// this was a single char, or the last char in the string
ret.append(ch);
} else {
if(index + 2 >= str.length()) {
throw new PatternSyntaxException(
"Malformed character range.", str, index + 1
);
}
// this char was the beginning of a range
for(char r = ch; r <= str.charAt(index + 2); r++) {
ret.append(r);
}
index = index + 2;
}
}
return ret.toString();
}
产生:
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
这篇关于是否有正则表达式方法用另一组替换一组字符(如shell tr命令)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!