Regexp删除嵌套的括号 [英] Regexp to remove nested parenthesis

查看:107
本文介绍了Regexp删除嵌套的括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图用Java编写正则表达式来删除下面括号中的所有内容,同时保留其他所有内容。 请注意,括号可以嵌套,我认为这就是我的模式失败的原因。有人能帮我吗?下面我试过:

I've been stuck trying to write a regular expression in Java to remove everything in the parenthesis below while preserving everything else. Note that the parenthesis can be nested, and I think this is why my pattern fails. Can someone help me? Below I tried:

    String testData =
            "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} " +
            "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. " +
            "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 " +
            "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 " +
            "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... " +
            "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. " +
            "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";


    testData = testData.replaceAll(Pattern.quote("{") + ".*" + Pattern.quote("}"), "")
                    .replaceAll(Pattern.quote("(") + ".*" + Pattern.quote(")"), "")
                    .replaceAll(Pattern.quote("$") + "[0-9]+", "");

    System.out.println(testData);

但这打印:


  1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5 .Nf3 O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. OOO Rf8)11。Nd5 c6 12. Ne7 + Kh8 13.Nxc8 Raxc8 14. Bxf6 14 ... Nxf6 15. Nd2 15 .. Bh6 16.f3 Nd7 17.Kc2 Bxd2 18. Rxd2 Nc5 19.b4 Ne6 20. Rd7 b5 21. Rxa7 Nd4 + 22.Kd3 Rcd8 23.Ke3 Nc2 + 24.Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 35 ... Rxh2 + 36.Kg4 Rg2 + 37.Kh4 Nd3 38. a3 Rh2 + 39.Kg4 Rh1 40. Rc6)26 ... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 29 .. .Rxe2 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7

这显然是错误的,因为它中有括号。

which is obviously wrong because it has parenthesis in it.

正确的答案是:


  1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. OOO Rf8 11. Nd5 c6 12. Ne7 + Kh8 13. Nxc8 Raxc8 14. Bxf6 14 ... Nxf6 15 Nd2 15 ... Bh6 16. f3 Nd7 17.Kc2 Bxd2 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 21. Rxa7 Nd4 + 22.Kd3 Rcd8 23.Ke3 Nc2 + 24.Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 26 ... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 29 ... Rxe2 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7


推荐答案

不要在这里使用正则表达式。正如您从示例中看到的那样, \\(。*?)\\)会尝试在首次创建(和next 所以如果数据类似

Don't use regex here. As you could see from your example something like \\(.*?)\\) would try to find minimal match between first founded ( and next ) so in case of data like

a (b (c d) e) f 

regex \(。*?\)将匹配

a (b (c d) e) f
  ^^^^^^^^

并将离开 e)部分不匹配。

您可能可以为此任务编写正则表达式,因为一些正则表达式支持递归,但不幸的是,Java中使用的正则表达式引擎没有。

You probably could write regex for this task because some regex flavors support recursion, but unfortunately regex engine used in Java doesn't.

因此,要删除嵌套括号,您可以编写自己的简单解析器,例如

(我假设文本格式正确,因此有没有像({)} 或未关闭的括号这样的东西

So to remove nested bracket you can write your own simple parser, like
(I assume that text is well formatted so there are no such things like ({)} or unclosed bracket)

String data = "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} "
        + "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. "
        + "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 "
        + "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 "
        + "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... "
        + "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. "
        + "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";

StringBuilder buffer = new StringBuilder();

int parenthesisCounter = 0;

for (char c : data.toCharArray()) {
    if (c == '(' || c == '{' )
        parenthesisCounter++;
    if (c == ')' || c == '}' )
        parenthesisCounter--;
    if (!(c == '(' || c == '{' || c == ')' || c == '}') && parenthesisCounter == 0)
        buffer.append(c);
}

之后,您可以专注于删除之前使用的其他不需要的数据

And after that you can just focus on removing rest of unwanted data like you used before

.replaceAll(Pattern.quote("$") + "[0-9]+", "");

所以结果

System.out.println(buffer.toString().replaceAll(
        Pattern.quote("$") + "[0-9]+", ""));


1。 d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. OOO Rf8 11. Nd5 c6 12. Ne7 + Kh8 13. Nxc8 Raxc8 14. Bxf6 14 ... Nxf6 15. Nd2 15 ... Bh6 16.f3 Nd7 17.Kc2 Bxd2 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 21. Rxa7 Nd4 + 22.Kd3 Rcd8 23.Ke3 Nc2 + 24。 Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 26 ... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 29 ... Rxe2 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7

这篇关于Regexp删除嵌套的括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆