Scala 惯用的编码风格只是编写低效代码的一个很酷的陷阱吗? [英] Is Scala idiomatic coding style just a cool trap for writing inefficient code?

查看:41
本文介绍了Scala 惯用的编码风格只是编写低效代码的一个很酷的陷阱吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我感觉 Scala 社区对编写简洁"、酷"、scala 惯用语"、单行"(如果可能)的代码有点痴迷.紧接着是与 Java/命令式/丑陋代码的比较.

I sense that the Scala community has a little big obsession with writing "concise", "cool", "scala idiomatic", "one-liner" -if possible- code. This is immediately followed by a comparison to Java/imperative/ugly code.

虽然这(有时)会导致代码易于理解,但它也会导致 99% 的开发人员的代码效率低下.这就是 Java/C++ 不容易被击败的地方.

While this (sometimes) leads to easy to understand code, it also leads to inefficient code for 99% of developers. And this is where Java/C++ is not easy to beat.

考虑这个简单的问题:给定一个整数列表,删除最大的元素.不需要保留排序.

Consider this simple problem: Given a list of integers, remove the greatest element. Ordering does not need to be preserved.

这是我的解决方案版本(它可能不是最好的,但这是普通非摇滚明星开发者会做的).

Here is my version of the solution (It may not be the greatest, but it's what the average non-rockstar developer would do).

def removeMaxCool(xs: List[Int]) = {
  val maxIndex = xs.indexOf(xs.max);
  xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}

它是 Scala 惯用的、简洁的,并使用了一些不错的列表函数.它也非常低效.它至少遍历列表 3 或 4 次.

It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.

这是我完全不酷的类似 Java 的解决方案.这也是一个合理的 Java 开发人员(或 Scala 新手)会写的东西.

Here is my totally uncool, Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.

def removeMaxFast(xs: List[Int]) = {
    var res = ArrayBuffer[Int]()
    var max = xs.head
    var first = true;   
    for (x <- xs) {
        if (first) {
            first = false;
        } else {
            if (x > max) {
                res.append(max)
                max = x
            } else {
                res.append(x)
            }
        }
    }
    res.toList
}

完全不是 Scala 惯用的、非功能性的、非简洁的,但它非常有效.它只遍历列表一次!

Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!

因此,如果 99% 的 Java 开发人员编写的代码比 99% 的 Scala 开发人员更高效,那么这是一个巨大的为更广泛地采用 Scala 跨越的障碍.有没有办法摆脱这个陷阱?

So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge obstacle to cross for greater Scala adoption. Is there a way out of this trap?

我正在寻找实用的建议,以避免此类低效率陷阱",同时保持实施清晰简洁.

I am looking for practical advice to avoid such "inefficiency traps" while keeping implementation clear ans concise.

澄清:这个问题来自一个真实的场景:我必须编写一个复杂的算法.首先我用 Scala 编写它,然后我不得不"用 Java 重写它.Java 实现的时间是原来的两倍,而且不是那么清楚,但同时它的速度却是原来的两倍.重写 Scala 代码以提高效率可能需要一些时间以及对 Scala 内部效率的更深入理解(对于 vs. map vs. fold 等)

Clarification: This question comes from a real-life scenario: I had to write a complex algorithm. First I wrote it in Scala, then I "had to" rewrite it in Java. The Java implementation was twice as long, and not that clear, but at the same time it was twice as fast. Rewriting the Scala code to be efficient would probably take some time and a somewhat deeper understanding of scala internal efficiencies (for vs. map vs. fold, etc)

推荐答案

让我们讨论问题中的一个谬误:

Let's discuss a fallacy in the question:

因此,如果 99% 的 Java 开发人员编写的代码比 99%Scala 开发人员,这是跨越更大 Scala 的巨大障碍采用.有没有办法摆脱这个陷阱?

So, if 99% of Java developers write more efficient code than 99% of Scala developers, this is a huge obstacle to cross for greater Scala adoption. Is there a way out of this trap?

这是假设,绝对没有证据支持.如果为假,则问题没有实际意义.

This is presumed, with absolutely no evidence backing it up. If false, the question is moot.

有相反的证据吗?好吧,让我们考虑一下问题本身——它不能证明任何事情,但表明事情并不那么清楚.

Is there evidence to the contrary? Well, let's consider the question itself -- it doesn't prove anything, but shows things are not that clear.

完全不是 Scala 惯用的、非功能性的、非简洁的,但它是非常有效率.它只遍历列表一次!

Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!

在第一句的四个声明中,前三个是正确的,第四个是,如用户未知,是假的!为什么它是假的?因为,与第二句所说的相反,它不止一次地遍历了列表.

Of the four claims in the first sentence, the first three are true, and the fourth, as shown by user unknown, is false! And why it is false? Because, contrary to what the second sentence states, it traverses the list more than once.

代码对其调用以下方法:

The code calls the following methods on it:

res.append(max)
res.append(x)

res.toList

让我们首先考虑append.

  1. append 接受一个可变参数.这意味着 maxx 首先被封装成某种类型的序列(实际上是一个 WrappedArray),然后作为参数传递.更好的方法应该是 +=.

  1. append takes a vararg parameter. That means max and x are first encapsulated into a sequence of some type (a WrappedArray, in fact), and then passed as parameter. A better method would have been +=.

好的,append 调用 ++=,它委托给 +=.但是,首先,它调用了 ensureSize,这是第二个错误(+= 也调用了——++= 只是针对多个元素).因为 Array 是一个固定大小的集合,这意味着在每次调整大小时,必须复制整个 Array

Ok, append calls ++=, which delegates to +=. But, first, it calls ensureSize, which is the second mistake (+= calls that too -- ++= just optimizes that for multiple elements). Because an Array is a fixed size collection, which means that, at each resize, the whole Array must be copied!

所以让我们考虑一下.当您调整大小时,Java 首先通过在每个元素中存储 0 来清除内存,然后 Scala 将前一个数组的每个元素复制到新数组中.由于大小每次都翻倍,这种情况会发生 log(n) 次,每次发生时复制的元素数量都会增加.

So let's consider this. When you resize, Java first clears the memory by storing 0 in each element, then Scala copies each element of the previous array over to the new array. Since size doubles each time, this happens log(n) times, with the number of elements being copied increasing each time it happens.

以 n = 16 为例.它这样做了四次,分别复制了 1、2、4 和 8 个元素.由于 Java 必须清除这些数组中的每一个,并且每个元素都必须被读取和写入,因此复制的每个元素代表对元素的 4 次遍历.将我们所有的 (n - 1) * 4 相加,或者粗略地说,完整列表的 4 次遍历.如果你把读写算作一次遍历,就像人们经常错误地做的那样,那么它仍然是三遍遍历.

Take for example n = 16. It does this four times, copying 1, 2, 4 and 8 elements respectively. Since Java has to clear each of these arrays, and each element must be read and written, each element copied represents 4 traversals of an element. Adding all we have (n - 1) * 4, or, roughly, 4 traversals of the complete list. If you count read and write as a single pass, as people often erroneously do, then it's still three traversals.

可以通过初始化 ArrayBuffer 来改进这一点,其初始大小等于将要读取的列表减去 1,因为我们将丢弃一个元素.不过,为了获得这个大小,我们需要遍历列表一次.

One can improve on this by initializing the ArrayBuffer with an initial size equal to the list that will be read, minus one, since we'll be discarding one element. To get this size, we need to traverse the list once, though.

现在让我们考虑 toList.简单来说,就是遍历整个列表,创建一个新的列表.

Now let's consider toList. To put it simply, it traverses the whole list to create a new list.

因此,我们对算法进行了 1 次遍历,对调整大小进行了 3 或 4 次遍历,对 toList 进行了 1 次额外的遍历.这是 4 或 5 次遍历.

So, we have 1 traversal for the algorithm, 3 or 4 traversals for resize, and 1 additional traversal for toList. That's 4 or 5 traversals.

原始算法有点难以分析,因为takedrop:::遍历了可变数量的元素.然而,将所有这些加在一起,它相当于 3 次遍历.如果使用 splitAt,它将减少到 2 次遍历.再进行 2 次遍历以获得最大值,我们得到 5 次遍历——与非功能性、非简洁算法的数量相同!

The original algorithm is a bit difficult to analyse, because take, drop and ::: traverse a variable number of elements. Adding all together, however, it does the equivalent of 3 traversals. If splitAt was used, it would be reduced to 2 traversals. With 2 more traversals to get the maximum, we get 5 traversals -- the same number as the non-functional, non-concise algorithm!

那么,让我们考虑改进.

So, let's consider improvements.

在命令式算法上,如果使用ListBuffer+=,那么所有的方法都是constant-time的,这就减少了一次遍历.

On the imperative algorithm, if one uses ListBuffer and +=, then all methods are constant-time, which reduces it to a single traversal.

在函数算法上,可以改写为:

On the functional algorithm, it could be rewritten as:

val max = xs.max
val (before, _ :: after) = xs span (max !=)
before ::: after

这将它减少到三个遍历的最坏情况.当然,还有基于递归或折叠的其他替代方案,可以在一次遍历中解决它.

That reduces it to a worst case of three traversals. Of course, there are other alternatives presented, based on recursion or fold, that solve it in one traversal.

而且,最有趣的是,所有这些算法都是 O(n),唯一一个几乎(偶然)导致最差复杂度的算法是命令式算法(因为数组复制).另一方面,命令式的缓存特性可能会使其更快,因为数据在内存中是连续的.然而,这与 big-Oh 或函数式 vs 命令式无关,这只是选择的数据结构的问题.

And, most interesting of all, all of these algorithms are O(n), and the only one which almost incurred (accidentally) in worst complexity was the imperative one (because of array copying). On the other hand, the cache characteristics of the imperative one might well make it faster, because the data is contiguous in memory. That, however, is unrelated to either big-Oh or functional vs imperative, and it is just a matter of the data structures that were chosen.

因此,如果我们真的去进行基准测试、分析结果、考虑方法的性能并寻找优化方法的麻烦,那么我们可以找到比函数式方式更快的方法来执行此操作,以命令方式执行此操作.

So, if we actually go to the trouble of benchmarking, analyzing the results, considering performance of methods, and looking into ways of optimizing it, then we can find faster ways to do this in an imperative manner than in a functional manner.

但所有这些努力与说普通 Java 程序员代码将比普通 Scala 程序员代码更快的说法大不相同——如果问题是一个例子,那完全是错误的.即使不考虑这个问题,我们也没有看到任何证据表明这个问题的基本前提是正确的.

But all this effort is very different from saying the average Java programmer code will be faster than the average Scala programmer code -- if the question is an example, that is simply false. And even discounting the question, we have seen no evidence that the fundamental premise of the question is true.

编辑

首先,让我重申我的观点,因为我似乎没有说清楚.我的观点是,普通 Java 程序员编写的代码似乎更有效率,但实际上并非如此.或者,换句话说,传统的 Java 风格不会让您获得性能——只有努力工作才能做到,无论是 Java 还是 Scala.

First, let me restate my point, because it seems I wasn't clear. My point is that the code the average Java programmer writes may seem to be more efficient, but actually isn't. Or, put another way, traditional Java style doesn't gain you performance -- only hard work does, be it Java or Scala.

接下来,我也有一个基准和结果,包括几乎所有建议的解决方案.关于它的两个有趣的点:

Next, I have a benchmark and results too, including almost all solutions suggested. Two interesting points about it:

  1. 根据列表大小,对象的创建可能比列表的多次遍历产生更大的影响.Adrian 的原始功能代码利用了列表是持久性数据结构这一事实,完全不复制最大元素右侧的元素.如果改用Vector,左右两边基本不会改变,这可能会带来更好的性能.

  1. Depending on list size, the creation of objects can have a bigger impact than multiple traversals of the list. The original functional code by Adrian takes advantage of the fact that lists are persistent data structures by not copying the elements right of the maximum element at all. If a Vector was used instead, both left and right sides would be mostly unchanged, which might lead to even better performance.

即使用户未知和范式具有相似的递归解决方案,范式的方法要快得多.原因是他避免了模式匹配.模式匹配可能真的很慢.

Even though user unknown and paradigmatic have similar recursive solutions, paradigmatic's is way faster. The reason for that is that he avoids pattern matching. Pattern matching can be really slow.

基准代码在这里,结果为这里.

这篇关于Scala 惯用的编码风格只是编写低效代码的一个很酷的陷阱吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆