关于标点符号的正则表达式 [英] Regular Expressions on Punctuation
问题描述
所以我对正则表达式完全不熟悉,我正在尝试使用Java的 java.util.regex
来查找标点符号输入字符串。我不知道我可能提前得到什么样的标点符号,除了(1)!,?,。,...都是有效的puncutation,以及(2)<和>表示特殊的东西,不算作标点符号。
程序本身伪随机地构建短语,我想在句子结束之前删除句点之前的句点。
So I'm completely new to regular expressions, and I'm trying to use Java's java.util.regex
to find punctuation in input strings. I won't know what kind of punctuation I might get ahead of time, except that (1) !, ?, ., ... are all valid puncutation, and (2) "<" and ">" mean something special, and don't count as punctuation.
The program itself builds phrases pseudo-randomly, and I want to strip off the punctuation at the end of a sentence before it goes through the random process.
我可以将整个单词与任何标点符号匹配,但匹配器只是为我提供了该单词的索引。换句话说:
I can match entire words with any punctuation, but the matcher just gives me indexes for that word. In other words:
Pattern p = Pattern.compile("(.*\\!)*?");
Matcher m = p.matcher([some input string]);
将使用!获取任何单词
最后。例如:
String inputString = "It is a warm Summer day!";
Pattern p = Pattern.compile("(.*\\!)*?");
Matcher m = p.matcher(inputString);
String match = inputString.substring(m.start(), m.end());
结果 - >字符串匹配〜天!
results in --> String match ~ "day!"
但是我想要 Matcher
索引!
,所以我可以分开它关闭了。
But I want to have Matcher
index just the "!"
, so I can just split it off.
我可以制作案例,并为每种情况使用 String.substring(...)
标点符号我可能会得到,但我希望我使用正则表达式做错了。
I could probably make cases, and use String.substring(...)
for each kind of punctuation I might get, but I'm hoping there's some mistake in my use of regular expressions to do this.
推荐答案
我会尝试类似的字符类正则表达式
I would try a character class regex similar to
"[.!?\\-]"
在 []
中添加您想要匹配的任何字符。小心转义任何可能对正则表达式解析器有特殊含义的字符。
Add whatever characters you wish to match inside the []
s. Be careful to escape any characters that might have a special meaning to the regex parser.
然后你必须使用 Matcher迭代匹配。 find()
直到它返回false。
You then have to iterate through the matches by using Matcher.find()
until it returns false.
这篇关于关于标点符号的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!