替换引号内的空格 [英] Replacing spaces within quotes

查看:220
本文介绍了替换引号内的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的在这里与正则表达式斗争。使用Java我将如何用另一个字符(或转义空格\)替换引号内的所有空格(实际上是双引号),但仅限于短语以a结尾通配符。

I'm really struggling with regex here. Using Java how would I go about replacing all spaces within quotes (double quotes really) with another character (or escaped space "\ ") but ONLY if the phrase ends with a wildcard character.

word1 AND "word2 word3 word4*" OR "word5 word6" OR word7

word1 AND "word2\ word3\ word4*" OR "word5 word6" OR word7


推荐答案

我认为最好的解决方案是使用正则表达式查找所需的引用字符串,然后替换正则表达式匹配中的空格。这样的事情:

I think the best solution is to use a regular expression to find the quoted strings you want, and then to replace the spaces within the regex's match. Something like this:

import java.util.regex.*;

class SOReplaceSpacesInQuotes {
  public static void main(String[] args) {
    Pattern findQuotes = Pattern.compile("\"[^\"]+\\*\"");

    for (String arg : args) {
      Matcher m = findQuotes.matcher(arg);

      StringBuffer result = new StringBuffer();
      while (m.find())
        m.appendReplacement(result, m.group().replace(" ", "\\\\ "));
      m.appendTail(result);

      System.out.println(arg + " -> " + result.toString());
    }
  }
}

运行 java SOReplaceSpacesInQuotes'word1 ANDword2 word3 word4 *或word5 word6 *或word7'然后愉快地产生输出 word1 ANDword2 word3 word4 *或word5 word6 *或word7 - > word1 ANDword2 \ word3 \ word4 *或word5 \ word6 *或word7 ,这正是你想要的。

Running java SOReplaceSpacesInQuotes 'word1 AND "word2 word3 word4*" OR "word5 word6*" OR word7' then happily produced the output word1 AND "word2 word3 word4*" OR "word5 word6*" OR word7 -> word1 AND "word2\ word3\ word4*" OR "word5\ word6*" OR word7, which is exactly what you wanted.

模式是[^] + \\ *,但反斜线和引号必须进行转义为Java。这匹配文字引号,任意数量的非引号, * 和引号,这是您想要的。这假定(a)您不允许嵌入 \转义序列,以及(b) * 是唯一的通配符。如果你有嵌入的转义序列,那么使用([^ \\] | \\。)\ * (对于Java来说,转义为 \([^ \\\\\\] | \\\\。)\\ * \);如果您有多个通配符,请使用[^] + [* +];如果你有两者,以明显的方式组合它们。处理多个通配符只是让它们中的任何一个在字符串的末尾匹配;处理转义序列是通过匹配引号后跟任意数量的非反斜杠,非引号字符,反斜杠来完成的。

The pattern is "[^"]+\*", but backslashes and quotes have to be escaped for Java. This matches a literal quote, any number of non-quotes, a *, and a quote, which is what you want. This assumes that (a) you aren't allowed to have embedded \" escape sequences, and (b) that * is the only wildcard. If you have embedded escape sequences, then use "([^\\"]|\\.)\*" (which, escaped for Java, is \"([^\\\\\\"]|\\\\.)\\*\"); if you have multiple wildcards, use "[^"]+[*+]"; and if you have both, combine them in the obvious way. Dealing with multiple wildcards is a matter of just letting any of them match at the end of the string; dealing with escape sequences is done by matching a quote followed by any number of non-backslash, non-quote characters, or a backslash preceding anything at all.

现在,该模式找到您想要的引用字符串。对于程序的每个参数,我们然后匹配所有参数,并使用 m.group()。replace(,\\\\),用反斜杠和空格替换匹配的内容(带引号的字符串)中的每个空格。 (这个字符串是 \\ - 为什么需要两个真正的反斜杠,我不确定。)如果你还没有看到 appendReplacement < (code>和 appendTail 之前(我没有),这是他们的行为:串联,他们遍历整个字符串,替换与之匹配的任何内容 appendReplacement 的第二个参数,并将其全部附加到给定的 StringBuffer appendTail 调用是必要的,以捕获最后不匹配的内容。 的文档Matcher.appendReplacement(StringBuffer,String) 包含了一个很好的使用示例。

Now, that pattern finds the quoted strings you want. For each argument to the program, we then match all of them, and using m.group().replace(" ", "\\\\ "), replace each space in what was matched (the quoted string) with a backslash and a space. (This string is \\—why two real backslashes are required, I'm not sure.) If you haven't seen appendReplacement and appendTail before (I hadn't), here's what they do: in tandem, they iterate through the entire string, replacing whatever was matched with the second argument to appendReplacement, and appending it all to the given StringBuffer. The appendTail call is necessary to catch whatever didn't match at the end. The documentation for Matcher.appendReplacement(StringBuffer,String) contains a good example of their use.

编辑:正如Roland Illig指出的那样,如果出现某些类型的无效输入,例如<$ c,则会出现问题$ c> a ANDbAND *c,这将成为 a ANDb\ AND \ *c。如果这是一个危险(或者它可能在将来可能成为危险,它可能会成为危险),那么你应该通过始终匹配引号使其更加健壮,但只有在它们结束时才会更换一个通配符。只要您的报价始终适当配对,这将是有效的,这是一个非常弱的假设。结果代码非常相似:

As Roland Illig pointed out, this is problematic if certain kinds of invalid input can appear, such as a AND "b" AND *"c", which would become a AND "b"\ AND\ *"c". If this is a danger (or if it could possibly become a danger in the future, which it likely could), then you should make it more robust by always matching quotes, but only replacing if they ended in a wildcard character. This will work as long as your quotes are always appropriately paired, which is a much weaker assumption. The resulting code is very similar:

import java.util.regex.*;

class SOReplaceSpacesInQuotes {
  public static void main(String[] args) {
    Pattern findQuotes = Pattern.compile("\"[^\"]+?(\\*)?\"");

    for (String arg : args) {
      Matcher m = findQuotes.matcher(arg);

      StringBuffer result = new StringBuffer();
      while (m.find()) {
        if (m.group(1) == null)
          m.appendReplacement(result, m.group());
        else
          m.appendReplacement(result, m.group().replace(" ", "\\\\ "));
      }
      m.appendTail(result);

      System.out.println(arg + " -> " + result.toString());
    }
  }
}

我们将通配符放在一个组中,并使其成为可选项,并使引号的主体不愿意 +?,以便它尽可能匹配 little 并让通配符分组。这样,我们相互匹配 连续的引号对,并且由于正则表达式引擎在匹配过程中不会重新启动,我们只会匹配引号的内部而不是外部。但是现在我们并不总是想要替换空格 - 如果有通配符,我们只想这样做。这很简单:测试组1是否 null 。如果是,则没有通配符,因此请将该字符串替换为自身。否则,请替换空格。事实上, java SOReplaceSpacesInQuotes'a ANDbdAND *cd'产生所需的 a ANDbdAND *cd - >一个ANDbdAND *cd,而 java SOReplaceSpacesInQuotes'a ANDbdANDcd *'执行替换得到 a ANDbdAND *cd - > a ANDb dANDc\ * d

We put the wildcard character in a group, and make it optional, and make the body of the quotes reluctant with +?, so that it will match as little as possible and let the wildcard character get grouped. This way, we match each successive pair of quotes, and since the regex engine won't restart in the middle of a match, we'll only ever match the insides, not the outsides, of quotes. But now we don't always want to replace the spaces—we only want to do so if there was a wildcard character. This is easy: test to see if group 1 is null. If it is, then there wasn't a wildcard character, so replace the string with itself. Otherwise, replace the spaces. And indeed, java SOReplaceSpacesInQuotes 'a AND "b d" AND *"c d"' yields the desired a AND "b d" AND *"c d" -> a AND "b d" AND *"c d", while java SOReplaceSpacesInQuotes 'a AND "b d" AND "c d*"' performs a substitution to get a AND "b d" AND *"c d" -> a AND "b d" AND "c\ *d".

这篇关于替换引号内的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆