`UnitRange` 和 `Array` 有什么区别? [英] What is the difference between `UnitRange` and `Array`?

查看:17
本文介绍了`UnitRange` 和 `Array` 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个版本的代码似乎做同样的事情:

I have two versions of code that seem to do the same thing:

sum = 0
for x in 1:100
    sum += x
end

sum = 0
for x in collect(1:100)
    sum += x
end

这两种方法有实际区别吗?

Is there a practical difference between the two approaches?

推荐答案

在 Julia 中,1:100 返回一个名为 UnitRange 的特定结构,如下所示:

In Julia, 1:100 returns a particular struct called UnitRange that looks like this:

julia> dump(1:100)
UnitRange{Int64}
  start: Int64 1
  stop: Int64 100

这是一个非常紧凑的结构,用于表示具有步骤 1 和任意(有限)大小的范围.UnitRangeAbstractRange 的子类型,表示任意步长范围的类型,AbstractVector 的子类型.

This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange, a type to represent ranges with arbitrary step, subtype of AbstractVector.

每当您使用 getindex(或语法糖 vector[index])时,UnitRange 的实例都会动态计算它们的元素.例如,使用 @less (1:100)[3] 你可以看到这个方法:

The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index]). For example, with @less (1:100)[3] you can see this method:

function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
    @_inline_meta
    val = v.start + (i - 1)
    @boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
    val % T
end

这是通过将 i - 1 添加到范围的第一个元素 (start) 来返回向量的第 i 元素.一些函数使用 UnitRange 优化方法,或者更普遍地使用 AbstractRange.例如,使用 @less sum(1:100) 您可以看到以下内容

This is returning the i-th element of the vector by adding i - 1 to the first element (start) of the range. Some functions have optimised methods with UnitRange, or more generally with AbstractRange. For instance, with @less sum(1:100) you can see the following

function sum(r::AbstractRange{<:Real})
    l = length(r)
    # note that a little care is required to avoid overflow in l*(l-1)/2
    return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
                                     : (step(r) * l) * ((l-1)>>1))
end

此方法使用 公式计算算术级数,这是非常有效的,因为它是在与向量大小无关的时间内进行评估的.

This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.

另一方面,collect(1:100) 返回一个普通的 Vector,其中包含一百个元素 1、2、3、...、100.与 UnitRange (或其他类型的 AbstractRange )的区别在于 getindex(vector::Vector, i) (或 vector[i],与 vector::Vector) 不做任何计算,而只是访问向量的第 i 元素.Vector 优于 UnitRange 的缺点是,一般来说,使用它们时没有有效的方法,因为这个容器的元素是完全任意的,而 UnitRange 表示一组具有特殊属性(排序、恒定步长等)的数字.

On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange) is that getindex(vector::Vector, i) (or vector[i], with vector::Vector) doesn't do any computation but simply accesses the i-th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).

如果您比较 UnitRange 具有超高效实现的方法的性能,这种类型将胜出(注意使用带有 $(...) 的变量插值 使用 BenchmarkTools 中的宏时:

If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools):

julia> using BenchmarkTools

julia> @btime sum($(1:1000_000))
  0.012 ns (0 allocations: 0 bytes)
500000500000

julia> @btime sum($(collect(1:1000_000)))
  229.979 μs (0 allocations: 0 bytes)
500000500000

请记住,每次使用 getindex 访问元素时,UnitRange 都会产生动态计算元素的成本.例如考虑这个函数:

Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex. Consider for example this function:

function test(vec)
    sum = zero(eltype(vec))
    for idx in eachindex(vec)
        sum += vec[idx]
    end
    return sum
end

让我们用一个 UnitRange 和一个普通的 Vector 对其进行基准测试:

Let's benchmark it with a UnitRange and a plain Vector:

julia> @btime test($(1:1000_000))
  812.673 μs (0 allocations: 0 bytes)
500000500000

julia> @btime test($(collect(1:1000_000)))
  522.828 μs (0 allocations: 0 bytes)
500000500000

在这种情况下,调用普通数组的函数比使用 UnitRange 的函数更快,因为它不必动态计算 100 万个元素.

In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.

当然,在这些玩具示例中,迭代 vec 的所有元素而不是其索引会更明智,但在现实世界的情况下,这样的情况可能更明智.然而,最后一个例子表明 UnitRange 不一定比普通数组更有效,尤其是当您需要动态计算其所有元素时.当您可以利用可以在恒定时间内执行操作的专用方法(如 sum)时,UnitRange 会更有效.

Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRanges are more efficient when you can take advantage of specialised methods (like sum) for which the operation can be performed in constant time.

作为文件说明,请注意,如果您最初有一个 UnitRange,则将其转换为普通 Vector 以获得良好性能不一定是一个好主意,特别是如果您将只使用一次或很少使用它,因为转换为 Vector 涉及范围内所有元素的动态计算和必要内存的分配:

As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:

julia> @btime collect($(1:1000_000));
  422.435 μs (2 allocations: 7.63 MiB)

julia> @btime test(collect($(1:1000_000)))
  882.866 μs (2 allocations: 7.63 MiB)
500000500000

这篇关于`UnitRange` 和 `Array` 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆