UnitRange和Array有什么区别? [英] What is the difference between `UnitRange` and `Array`?

查看:233
本文介绍了UnitRange和Array有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个似乎执行相同操作的代码版本:

I have two versions of code that seem to do the same thing:

sum = 0
for x in 1:100
    sum += x
end

sum = 0
for x in collect(1:100)
    sum += x
end

两种方法之间是否有实际区别?

Is there a practical difference between the two approaches?

推荐答案

在Julia中,1:100返回一个名为UnitRange的特殊结构,如下所示:

In Julia, 1:100 returns a particular struct called UnitRange that looks like this:

julia> dump(1:100)
UnitRange{Int64}
  start: Int64 1
  stop: Int64 100

这是一个非常紧凑的结构,用于表示具有步骤1和任意(有限)大小的范围. UnitRangeAbstractRange的子类型,它是表示具有任意步长的范围的类型,是AbstractVector的子类型.

This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange, a type to represent ranges with arbitrary step, subtype of AbstractVector.

每当您使用getindex(或语法糖vector[index])时,UnitRange的实例都会动态计算其元素.例如,使用@less (1:100)[3]您可以看到此方法:

The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index]). For example, with @less (1:100)[3] you can see this method:

function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
    @_inline_meta
    val = v.start + (i - 1)
    @boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
    val % T
end

这通过将i - 1添加到范围的第一个元素(start)返回向量的第i个元素.某些功能使用UnitRange,或更普遍地使用AbstractRange,对方法进行了优化.例如,使用@less sum(1:100)您可以看到以下内容

This is returning the i-th element of the vector by adding i - 1 to the first element (start) of the range. Some functions have optimised methods with UnitRange, or more generally with AbstractRange. For instance, with @less sum(1:100) you can see the following

function sum(r::AbstractRange{<:Real})
    l = length(r)
    # note that a little care is required to avoid overflow in l*(l-1)/2
    return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
                                     : (step(r) * l) * ((l-1)>>1))
end

此方法使用公式求和算术级数,因为它在与向量大小无关的时间内进行评估,因此非常高效.

This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.

另一方面,collect(1:100)返回具有一百个元素1,2,3,...,100的普通Vector.与UnitRange(或其他类型的AbstractRange)的主要区别是getindex(vector::Vector, i)(或vector[i],带有vector::Vector)不执行任何计算,而只是访问向量的第i个元素. Vector相对于UnitRange的不利之处在于,通常来说,使用它们时没有有效的方法,因为此容器的元素是完全任意的,而UnitRange代表一组具有特殊属性的数字(已排序,恒定步长等).

On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange) is that getindex(vector::Vector, i) (or vector[i], with vector::Vector) doesn't do any computation but simply accesses the i-th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).

如果比较使用UnitRange具有超高效实现的方法的性能,则这种类型将毫无用处(请注意,使用BenchmarkTools中的宏时,使用$(...)进行变量插值):

If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools):

julia> using BenchmarkTools

julia> @btime sum($(1:1000_000))
  0.012 ns (0 allocations: 0 bytes)
500000500000

julia> @btime sum($(collect(1:1000_000)))
  229.979 μs (0 allocations: 0 bytes)
500000500000

请记住,每次使用getindex访问元素时,UnitRange都会带来动态计算元素的成本.例如考虑以下功能:

Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex. Consider for example this function:

function test(vec)
    sum = zero(eltype(vec))
    for idx in eachindex(vec)
        sum += vec[idx]
    end
    return sum
end

让我们用UnitRange和普通的Vector对其进行基准测试:

Let's benchmark it with a UnitRange and a plain Vector:

julia> @btime test($(1:1000_000))
  812.673 μs (0 allocations: 0 bytes)
500000500000

julia> @btime test($(collect(1:1000_000)))
  522.828 μs (0 allocations: 0 bytes)
500000500000

在这种情况下,调用普通数组的函数比使用UnitRange的函数要快,因为它不必动态计算100万个元素.

In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.

当然,在这些玩具示例中,遍历vec的所有元素而不是索引要更明智,但是在现实世界中,这种情况可能更明智.但是,最后一个示例表明,UnitRange不一定比普通数组更有效,尤其是在需要动态计算其所有元素的情况下.如果您可以利用可以在恒定时间内执行操作的专用方法(如sum),则UnitRange效率更高.

Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRanges are more efficient when you can take advantage of specialised methods (like sum) for which the operation can be performed in constant time.

作为文件备注,请注意,如果您最初有一个UnitRange,将其转换为普通的Vector以获得良好的性能不一定是一个好主意,特别是如果您只使用一次或转换为Vector的过程很少,涉及到范围所有元素的动态计算以及必要内存的分配:

As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:

julia> @btime collect($(1:1000_000));
  422.435 μs (2 allocations: 7.63 MiB)

julia> @btime test(collect($(1:1000_000)))
  882.866 μs (2 allocations: 7.63 MiB)
500000500000

这篇关于UnitRange和Array有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆