如何写“好"处理多种类型和数组时的 Julia 代码(多次分派) [英] How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)

查看:16
本文介绍了如何写“好"处理多种类型和数组时的 Julia 代码(多次分派)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OP 更新:请注意,在最新版本的 Julia (v0.5) 中,回答这个问题的惯用方法是定义 mysquare(x::Number) = x^2.使用自动广播覆盖矢量化案例,即 x = randn(5) ;mysquare.(x).另请参阅更详细地解释点语法的新答案.

OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.

我是 Julia 新手,鉴于我的 Matlab 出身,我很难确定如何编写利用多重调度和 Julia 类型系统的好的"Julia 代码.

I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.

考虑我有一个提供 Float64 平方的函数的情况.我可以这样写:

Consider the case where I have a function that provides the square of a Float64. I might write this as:

function mysquare(x::Float64)
    return(x^2);
end

有时,我想将所有 Float64 平方在一维数组中,但不想每次都在 mysquare 上写一个循环,所以我使用多重调度并添加以下内容:

Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:

function mysquare(x::Array{Float64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

但现在我有时会使用 Int64,所以我写了另外两个利用多重分派的函数:

But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:

function mysquare(x::Int64)
    return(x^2);
end
function mysquare(x::Array{Int64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

这是对的吗?还是有更符合意识形态的方法来处理这种情况?我应该使用这样的类型参数吗?

Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?

function mysquare{T<:Number}(x::T)
    return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

这感觉很明智,但是我的代码会像我避免参数类型的情况一样快速运行吗?

This feels sensible, but will my code run as quickly as the case where I avoid parametric types?

总的来说,我的问题有两个部分:

In summary, there are two parts to my question:

  1. 如果快速代码对我很重要,我应该使用上述参数类型,还是应该为不同的具体类型编写多个版本?还是我应该完全做其他事情?

  1. If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?

当我想要一个对数组和标量进行操作的函数时,编写两个版本的函数,一个用于标量,一个用于数组,是否是一种好习惯?还是我应该完全做其他事情?

When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?

最后,请指出您在上面的代码中能想到的任何其他问题,因为我在这里的最终目标是编写好的 Julia 代码.

Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.

推荐答案

Julia 会根据需要为每组输入编译特定版本的函数.因此,回答第 1 部分,没有性能差异.参数化的方式是要走的路.

Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.

至于第 2 部分,在某些情况下编写单独的版本可能是个好主意(有时出于性能原因,例如,避免复制).但是,在您的情况下,您可以使用内置宏 @vectorize_1arg 自动生成数组版本,例如:

As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro @vectorize_1arg to automatically generate the array version, e.g.:

function mysquare{T<:Number}(x::T)
    return(x^2)
end
@vectorize_1arg Number mysquare
println(mysquare([1,2,3]))

至于一般风格,不要使用分号,mysquare(x::Number) = x^2要短很多.

As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.

对于您的矢量化 mysquare,请考虑 TBigFloat 的情况.但是,您的输出数组是 Float64.处理此问题的一种方法是将其更改为

As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to

function mysquare{T<:Number}(x::Array{T,1})
    n = length(x)
    y = Array(T, n)
    for k = 1:n
        @inbounds y[k] = x[k]^2
    end
    return y
 end

我在其中添加了 @inbounds 宏来提高速度,因为我们不需要每次都检查边界违规——我们知道长度.如果 x[k]^2 的类型不是 T,此函数仍可能存在问题.一个更具防御性的版本可能是

where I've added the @inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be

function mysquare{T<:Number}(x::Array{T,1})
    n = length(x)
    y = Array(typeof(one(T)^2), n)
    for k = 1:n
        @inbounds y[k] = x[k]^2
    end
    return y
 end

如果 T 是一个 Int 并且 1.0,则 one(T) 将给出 1 如果 TFloat64,以此类推.仅当您想要制作超健壮的库代码时,这些注意事项才重要.如果您真的只处理 Float64s 或可以提升为 Float64s 的东西,那么这不是问题.看似辛苦,但力量却是惊人的.您总是可以满足于类似于 Python 的性能而忽略所有类型信息.

where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.

这篇关于如何写“好"处理多种类型和数组时的 Julia 代码(多次分派)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆