Java正则表达式提供任何性能优势? [英] Java regular expression offers any performance benefit?

查看:121
本文介绍了Java正则表达式提供任何性能优势?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,当我们尝试使用正则表达式进行模式匹配时。例如获取输入字符串并使用正则表达式来确定它是否为数字。如果没有,抛出异常。
在这种情况下,我理解,使用正则表达式使得代码比我们获取字符串的每个字符更简洁,检查它是否是数字,如果不是抛出异常。

In Java, when we try to do pattern matching using a regular expression. e.g. take a input string and use regular expression to find out if it is numeric. If not, throw an exception. In this case, I understand, using regex makes the code less verbose than if we were to take each character of the string, check if it is a number and if not throw an exception.

但我假设正则表达式也使这​​个过程更有效率。这是真的?我在这一点上找不到任何证据。正则表达式如何在幕后进行比赛?它是不是也在迭代字符串并逐个检查每个字符?

But I was under the assumption that regex also makes the process more efficient. IS this true? I cannot find any evidence on this point. How is regex doing the match behind the scenes? IS it not also iterating over the string and checking each character one by one?

推荐答案

为了好玩,我已经运行了这个微观基准。最后一次运行的结果(即后JVM热身/ JIT)结果如下(无论如何,从一次运行到另一次运行的结果相当一致):

Just for fun, I have run this micro benchmark. The results of the last run (i.e. post JVM warm up / JIT) are below (results are fairly consistent from one run to another anyway):

regex with numbers 123
chars with numbers 33
parseInt with numbers 33
regex with words 123
chars with words 34
parseInt with words 733

换句话说,字符效率非常高,Integer.parseInt和char一样有效,如果字符串是数字,但如果字符串不是数字,则非常慢。正则表达式介于两者之间。

In other words, chars is very efficient, Integer.parseInt is as efficient as char IF the string is a number, but awfully slow if the string is not a number. Regex is in between.

结论

如果将字符串解析为数字,你希望字符串一般是一个数字,使用Integer.parseInt是最好的解决方案(高效和可读)。如果字符串不是一个数字,你得到的惩罚应该是低的,如果它不是太频繁。

If you parse a string into a number and you expect the string to be a number in general, using Integer.parseInt is the best solution (efficient and readable). The penalty you get when the string is not a number should be low if it is not too frequent.

ps:我的正则表达式可能不是最优的,随意发表评论。

ps: my regex is maybe not optimal, feel free to comment.

public class TestNumber {

    private final static List<String> numbers = new ArrayList<>();
    private final static List<String> words = new ArrayList<>();

    public static void main(String args[]) {
        long start, end;
        Random random = new Random();

        for (int i = 0; i < 1000000; i++) {
            numbers.add(String.valueOf(i));
            words.add(String.valueOf(i) + "x");
        }

        for (int i = 0; i < 5; i++) {
            start = System.nanoTime();
            regex(numbers);
            System.out.println("regex with numbers " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            chars(numbers);
            System.out.println("chars with numbers " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            exception(numbers);
            System.out.println("exceptions with numbers " + (System.nanoTime() - start) / 1000000);

            start = System.nanoTime();
            regex(words);
            System.out.println("regex with words " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            chars(words);
            System.out.println("chars with words " + (System.nanoTime() - start) / 1000000);
            start = System.nanoTime();
            exception(words);
            System.out.println("exceptions with words " + (System.nanoTime() - start) / 1000000);
        }
    }

    private static int regex(List<String> list) {
        int sum = 0;
        Pattern p = Pattern.compile("[0-9]+");
        for (String s : list) {
            sum += (p.matcher(s).matches() ? 1 : 0);
        }
        return sum;
    }

    private static int chars(List<String> list) {
        int sum = 0;

        for (String s : list) {
            boolean isNumber = true;
            for (char c : s.toCharArray()) {
                if (c < '0' || c > '9') {
                    isNumber = false;
                    break;
                }
            }
            if (isNumber) {
                sum++;
            }
        }
        return sum;
    }

    private static int exception(List<String> list) {
        int sum = 0;

        for (String s : list) {
            try {
                Integer.parseInt(s);
                sum++;
            } catch (NumberFormatException e) {
            }
        }
        return sum;
    }
}

这篇关于Java正则表达式提供任何性能优势?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆