如何将Java单词边界与撇号一起使用? [英] How do you use the Java word boundary with apostrophes?

查看:44
本文介绍了如何将Java单词边界与撇号一起使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图删除列表中一个单词的所有出现,但是当单词中带有撇号时,我遇到了麻烦.

I am trying to delete all the occurrences of a word in a list, but I am having trouble when there are apostrophes in the words.

String phrase="bob has a bike and bob's bike is red";
String word="bob";
phrase=phrase.replaceAll("\\b"+word+"\\b","");
System.out.println(phrase);

输出:
有一辆自行车,而它的自行车是红色的

我想要的是
有一辆自行车,鲍勃的自行车是红色的

我对正则表达式的了解有限,所以我猜想有一个解决方案,但是我现在还不足以创建用于处理撇号的正则表达式.另外,我希望它可以使用破折号,因此短语新邮件是电子邮件将仅替换第一次出现的邮件.

I have a limited understanding of regex so I'm guessing there is a solution, but I do not now enough to create the regex to handle apostrophes. Also I would like it to work with dashes so the phrase the new mail is e-mail would only replace the first occurrence of mail.

推荐答案

这全都取决于您理解什么是单词".也许您最好将自己理解的内容定义为单词定界符:例如,空格,逗号....并写为

It all depends on what you understan to be a "word". Perhaps you'd better define what you understand to be a word delimiter: for example, blanks, commas .... And write something as

phrase=phrase.replaceAll("([ \\s,.;])" + Pattern.quote(word)+ "([ \\s,.;])","$1$2");

但是您必须另外检查字符串的开头和结尾是否出现例如:

But you'll have to check additionally for occurrences at the start and the end of the string For example:

  String phrase="bob has a bike bob, bob and boba bob's bike is red and \"bob\" stuff.";
  String word="bob";
  phrase=phrase.replaceAll("([\\s,.;])" + Pattern.quote(word) + "([\\s,.;])","$1$2");
  System.out.println(phrase);

打印此

bob has a bike ,  and boba bob's bike is red and "bob" stuff.

更新:如果您坚持使用 \ b ,并考虑到单词边界"可以理解Unicode,那么您也可以使用这种肮脏的技巧:替换所有'您确定不会在文本中出现一些Unicode字母,然后进行反向替换.示例:

Update: If you insist in using \b, considering that the "word boundary" understand Unicode, you can also do this dirty trick: replace all ocurrences of ' by some Unicode letter that you're are sure will not appear in your text, and afterwards do the reverse replacemente. Example:

  String phrase="bob has a bike bob, bob and boba bob's bike is red and \"bob\" stuff.";
  String word="bob";
  phrase= phrase.replace("'","ñ").replace('"','ö');
  phrase=phrase.replaceAll("\\b" + Pattern.quote(word) + "\\b","");
  phrase= phrase.replace('ö','"').replace("ñ","'");
  System.out.println(phrase);

更新:以下总结了一些评论:人们希望 \ w \ b 具有与哪个是文字字符"相同的概念,几乎每个正则表达式方言都可以做到.好吧,Java不会: \ w 考虑ASCII, \ b 考虑Unicode.我同意,这是一个丑陋的不一致.

UPDATE: To summarize some comments below: one would expect \w and \b to have the same notion as to which is a "word character", as almost every regular-expression dialect do. Well, Java does not: \w considers ASCII, \b considers Unicode. It's an ugly inconsistence, I agree.

更新2:自Java 7(如注释中所指出)以来,此处.

Update 2: Since Java 7 (as pointed out in comments) the UNICODE_CHARACTER_CLASS flag allows to specify a consistent Unicode-only behaviour, see eg here.

这篇关于如何将Java单词边界与撇号一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆