使用正则表达式匹配非单词字符但不要笑脸 [英] Using regex to match non-word characters BUT NOT smiley faces
问题描述
我有一个Java程序,它应该从字符串中删除所有非字母字符,除非它们是笑脸,例如=)或=]或:P
I have a Java program which is supposed to remove all non-letter characters from a string, except when they are a smiley face such as =) or =] or :P
很容易匹配 [a-zA-Z] | = \)| = \] |:P
,但我无法弄清楚如何否定这种表达方式。由于我使用的是String.replaceAll()函数,因此它必须是否定形式。
It's very easy to match the opposite with [a-zA-Z ]|=\)|=\]|:P
but I cannot figure out how to negate this expression. Since I am using the String.replaceAll() function it must be in the negated form.
我认为部分问题可能来自微笑一般为2的事实字符长,我一次只匹配1个字符?
I believe part of the issue may come from the fact that smiles are generally 2 characters long, and I am only matching 1 character at a time?
有趣的是, replaceAll((?![Tt])[Oo] ,)
删除字母O的每次出现,即使在单词to中也是如此。这是否意味着我的replaceAll函数不能理解正则表达式的前瞻性?它没有抛出任何错误...
Interestingly, replaceAll("(?![Tt])[Oo]","")
removes every occurrence of the letter O, even in the word "to." Does this mean my replaceAll function does not understand regex lookahead? It doesn't throw any errors...
我最终使用
replaceAll("(?<![=:;])[\\]\\[\\(\\)\\/]","")
.replaceAll("[=:;](?![\\]\\[\\(\\)o0OpPxX\\/])","")
.replaceAll("[^a-zA-Z=:;\\(\\)\\[\\]\\/ ]","")
这非常凌乱,但效果很好。 ......快! (棕色)狐狸跳过[]懒狗。 :] = O; X
变成在懒惰的狗身上快速布朗的狐狸笑声:] = O; X
which is extremely messy but works perfectly. The... quick! (brown) fox jump's over the[] lazy dog. :] =O ;X
becomes THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG :] =O ;X
编辑:忽略该修复,请参阅下面接受的答案。
Ignore that fix, see the accepted answer below.
推荐答案
使用负向前瞻应该很容易。基本上,匹配将在(?!...)
组内的正则表达式匹配的任何位置失败。您应该使用单个通配符(。
)跟随负向前瞻,以便在前瞻不匹配时使用字符(意味着下一个字符是非字母字符,即不是笑脸的一部分。)
It should be pretty easy to due this using a negative lookahead. Basically the match will fail at any position where the regex inside of the (?!...)
group matches. You should follow the negative lookahead with a single wildcard (.
) to consume a character if the lookahead did not match (meaning that the next character is a non-letter character that is not part of a smiley face).
编辑显然我没有彻底测试我的原始正则表达式,你还需要一个负面的背后在。
之后,确保你消费的角色不是笑脸中的第二个角色:
edit: Clearly I hadn't tested my original regex very thoroughly, you also need a negative lookbehind following the .
to make sure that the character you consumed was not the second character in a smiley:
(?![a-zA-Z ]|=\)|=\]|:P).(?<!=\)|=\]|:P)
请注意,您可以通过使用眼睛和嘴巴的字符类来缩短正则表达式,例如:
Note that you might be able to shorten the regex by using character classes for the eyes and the mouth, for example:
[:=][\(\)\[\]]
^ ^-----mouth
|--eyes
这篇关于使用正则表达式匹配非单词字符但不要笑脸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!