`UnitRange` 和 `Array` 有什么区别? [英] What is the difference between `UnitRange` and `Array`?
问题描述
我有两个版本的代码似乎做同样的事情:
I have two versions of code that seem to do the same thing:
sum = 0
for x in 1:100
sum += x
end
sum = 0
for x in collect(1:100)
sum += x
end
这两种方法有实际区别吗?
Is there a practical difference between the two approaches?
推荐答案
在 Julia 中,1:100
返回一个名为 UnitRange
的特定结构,如下所示:
In Julia, 1:100
returns a particular struct called UnitRange
that looks like this:
julia> dump(1:100)
UnitRange{Int64}
start: Int64 1
stop: Int64 100
这是一个非常紧凑的结构,用于表示具有步骤 1 和任意(有限)大小的范围.UnitRange
是 AbstractRange
的子类型,表示任意步长范围的类型,AbstractVector
的子类型.
This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange
is subtype of AbstractRange
, a type to represent ranges with arbitrary step, subtype of AbstractVector
.
每当您使用 getindex
(或语法糖 vector[index]
)时,UnitRange
的实例都会动态计算它们的元素.例如,使用 @less (1:100)[3]
你可以看到这个方法:
The instances of UnitRange
dynamically compute their elements whenever the you use getindex
(or the syntactic sugar vector[index]
). For example, with @less (1:100)[3]
you can see this method:
function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
@_inline_meta
val = v.start + (i - 1)
@boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
val % T
end
这是通过将 i - 1
添加到范围的第一个元素 (start
) 来返回向量的第 i
元素.一些函数使用 UnitRange
优化方法,或者更普遍地使用 AbstractRange
.例如,使用 @less sum(1:100)
您可以看到以下内容
This is returning the i
-th element of the vector by adding i - 1
to the first element (start
) of the range. Some functions have optimised methods with UnitRange
, or more generally with AbstractRange
. For instance, with @less sum(1:100)
you can see the following
function sum(r::AbstractRange{<:Real})
l = length(r)
# note that a little care is required to avoid overflow in l*(l-1)/2
return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
: (step(r) * l) * ((l-1)>>1))
end
此方法使用 公式计算算术级数,这是非常有效的,因为它是在与向量大小无关的时间内进行评估的.
This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.
另一方面,collect(1:100)
返回一个普通的 Vector
,其中包含一百个元素 1、2、3、...、100.与 UnitRange
(或其他类型的 AbstractRange
)的区别在于 getindex(vector::Vector, i)
(或 vector[i]
,与 vector::Vector
) 不做任何计算,而只是访问向量的第 i
元素.Vector
优于 UnitRange
的缺点是,一般来说,使用它们时没有有效的方法,因为这个容器的元素是完全任意的,而 UnitRange
表示一组具有特殊属性(排序、恒定步长等)的数字.
On the other hand, collect(1:100)
returns a plain Vector
with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange
(or other types of AbstractRange
) is that getindex(vector::Vector, i)
(or vector[i]
, with vector::Vector
) doesn't do any computation but simply accesses the i
-th element of the vector. The downside of a Vector
over a UnitRange
is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange
represents a set of numbers with peculiar properties (sorted, constant step, etc...).
如果您比较 UnitRange
具有超高效实现的方法的性能,这种类型将胜出(注意使用带有 $(...) 的变量插值
使用 BenchmarkTools
中的宏时:
If you compare the performance of methods for which UnitRange
has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...)
when using macros from BenchmarkTools
):
julia> using BenchmarkTools
julia> @btime sum($(1:1000_000))
0.012 ns (0 allocations: 0 bytes)
500000500000
julia> @btime sum($(collect(1:1000_000)))
229.979 μs (0 allocations: 0 bytes)
500000500000
请记住,每次使用 getindex
访问元素时,UnitRange
都会产生动态计算元素的成本.例如考虑这个函数:
Remember that UnitRange
comes with the cost of dynamically computing the elements every time you access them with getindex
. Consider for example this function:
function test(vec)
sum = zero(eltype(vec))
for idx in eachindex(vec)
sum += vec[idx]
end
return sum
end
让我们用一个 UnitRange
和一个普通的 Vector
对其进行基准测试:
Let's benchmark it with a UnitRange
and a plain Vector
:
julia> @btime test($(1:1000_000))
812.673 μs (0 allocations: 0 bytes)
500000500000
julia> @btime test($(collect(1:1000_000)))
522.828 μs (0 allocations: 0 bytes)
500000500000
在这种情况下,调用普通数组的函数比使用 UnitRange
的函数更快,因为它不必动态计算 100 万个元素.
In this case the function calling the plain array is faster than the one with a UnitRange
because it doesn't have to dynamically compute 1 million elements.
当然,在这些玩具示例中,迭代 vec
的所有元素而不是其索引会更明智,但在现实世界的情况下,这样的情况可能更明智.然而,最后一个例子表明 UnitRange
不一定比普通数组更有效,尤其是当您需要动态计算其所有元素时.当您可以利用可以在恒定时间内执行操作的专用方法(如 sum
)时,UnitRange
会更有效.
Of course, in these toy examples it'd be more sensible to iterate over all elements of vec
rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange
is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRange
s are more efficient when you can take advantage of specialised methods (like sum
) for which the operation can be performed in constant time.
作为文件说明,请注意,如果您最初有一个 UnitRange
,则将其转换为普通 Vector
以获得良好性能不一定是一个好主意,特别是如果您将只使用一次或很少使用它,因为转换为 Vector
涉及范围内所有元素的动态计算和必要内存的分配:
As a file remark, note that if you originally have a UnitRange
it's not necessarily a good idea to convert it to a plain Vector
to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector
involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:
julia> @btime collect($(1:1000_000));
422.435 μs (2 allocations: 7.63 MiB)
julia> @btime test(collect($(1:1000_000)))
882.866 μs (2 allocations: 7.63 MiB)
500000500000
这篇关于`UnitRange` 和 `Array` 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!