为什么我的字符串操作使用lambda表达式很慢? [英] Why my string manipulation is slow using lambda expression?

查看:149
本文介绍了为什么我的字符串操作使用lambda表达式很慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个方法将逗号分隔的单词作为 String ,并以逗号分隔的单词返回 String 自然排序顺序,不包含任何4个字母的单词,包含UPPER大小写中的所有单词,没有重复项。与第二种方法相比,第一种方法相当慢。你能帮我理解为什么以及如何改进我的方法?

A method takes comma separated words as a String and returns a String of comma separated words with the words in natural sort order, not containing any 4 letter words, contain all words in UPPER case and no duplicates. The 1st approach is quite slow in comparison to the 2nd approach. Can you help me understand why and how can I improve my approach?

方法1:

Approach 1:

public String stringProcessing(String s){
      Stream<String> tokens = Arrays.stream(s.split(","));
      return tokens.filter(t -> t.length() != 4) .distinct()
                   .sorted() 
                   .collect(Collectors.joining(",")).toUpperCase();
}

方法2:

Approach 2:

public String processing(String s) {
    String[] tokens = s.split(",");
    Set<String> resultSet = new TreeSet<>();
    for(String t:tokens){
        if(t.length() !=  4)
            resultSet.add(t.toUpperCase());
    }        
    StringBuilder result = new StringBuilder();
    resultSet.forEach(key -> {
        result.append(key).append(","); 
    });
    result.deleteCharAt(result.length()-1);
    return result.toString();
}


推荐答案

性能比较而不记录使用JRE版本,输入数据集和基准测试方法不适合得出任何结论。

A performance comparison without documenting the used JRE version, input data sets nor benchmark methodology is not suitable to draw any conclusions.

此外,您的变体之间存在根本差异。在转换完整的结果字符串之前,使用 distinct()时,第一个变体处理原始字符串,可能保留比第二个变体多得多的元素,将所有元素连接到字符串大写。相反,您的第二个变体在添加到集合之前会转换单个元素,因此只会进一步处理具有不同大写字母表示的字符串。所以第二种变体在加入时可能需要更少的内存并处理更少的元素。

Further, there are fundamental differences between your variants. You first variant processes the original strings when using distinct(), potentially keeping much more elements than the second variant, joins all of them to a string, before transforming the complete result string to upper case. In contrast, your second variant transforms individual elements before adding to the set, so only strings with a distinct upper case representation are processed further. So the second variant may need significantly less memory and process less elements when joining.

所以当做完全不同的事情时,比较这些操作的性能没有多大意义。这两种变体之间的比较更好:

So when doing entirely different things, there is not much sense in comparing the performance of these operations. A better comparison would be between these two variants:

public String variant1(String s){
    Stream<String> tokens = Arrays.stream(s.split(","));
    return tokens.filter(t -> t.length() != 4)
                 .map(String::toUpperCase)
                 .sorted().distinct()
                 .collect(Collectors.joining(","));
}

public String variant2(String s) {
    String[] tokens = s.split(",");
    Set<String> resultSet = new TreeSet<>();
    for(String t:tokens){
        if(t.length() !=  4)
            resultSet.add(t.toUpperCase());
    }
    return String.join(",", resultSet);
}

请注意,我更改了 sorted()的顺序 distinct();如此答案中所述,在< c>之后直接应用 distinct() code> sorted()允许在 distinct 操作中利用流的排序特性。

Note that I changed the order of sorted() and distinct(); as discussed in this answer, applying distinct() directly after sorted() allows to exploit the sorted nature of the stream within the distinct operation.

您也可以考虑在流式传输之前不要创建包含所有子字符串的临时数组:

You may also consider not creating a temporary array holding all substrings before streaming over them:

public String variant1(String s){
    return Pattern.compile(",").splitAsStream(s)
            .filter(t -> t.length() != 4)
            .map(String::toUpperCase)
            .sorted().distinct()
            .collect(Collectors.joining(","));
}

您还可以添加第三个变体,

You may also add a third variant,

public String variant3(String s) {
    Set<String> resultSet = new TreeSet<>();
    int o = 0, p;
    for(p = s.indexOf(','); p>=0; p = s.indexOf(',', o=p+1)) {
        if(p-o == 4) continue;
        resultSet.add(s.substring(o, p).toUpperCase());
    }
    if(s.length()-o != 4) resultSet.add(s.substring(o).toUpperCase());
    return String.join(",", resultSet);
}

它不会创建子串数组,甚至会跳过子串创建那些不符合过滤条件的。这并不意味着建议在生产代码中使用如此低的级别,但总是可能存在更快的变体,因此无论您使用的变体是否最快,而且它是否运行合理并不重要可维护的速度。

which doesn’t create an array of substrings and even skips the substring creation for those not matching the filter criteria. This isn’t meant to suggest to go such low level in production code, but that there always might be a faster variant, so it doesn’t matter whether the variant you’re using is the fastest, but rather whether it runs at reasonable speed while being maintainable.

这篇关于为什么我的字符串操作使用lambda表达式很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆