如何写“好"字?处理多种类型和数组时的Julia代码(多种调度) [英] How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)

查看:105
本文介绍了如何写“好"字?处理多种类型和数组时的Julia代码(多种调度)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OP更新:请注意,在最新版本的Julia(v0.5)中,回答此问题的惯用方法是只定义mysquare(x::Number) = x^2.使用自动广播(即x = randn(5) ; mysquare.(x))覆盖矢量化的案例.另请参阅新答案,以更详细地解释点语法.

OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.

我是Julia的新手,并且由于我的Matlab来历,我在确定如何编写利用多重调度和Julia类型系统的好" Julia代码方面遇到一些困难.

I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.

考虑一下我有一个提供Float64平方的函数的情况.我可以这样写:

Consider the case where I have a function that provides the square of a Float64. I might write this as:

function mysquare(x::Float64)
    return(x^2);
end

有时候,我想将所有Float64平方成一个一维数组,但不想每次都写出一个遍历mysquare的循环,所以我使用了多个分派并添加以下内容:

Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:

function mysquare(x::Array{Float64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

但是现在我有时会使用Int64,因此我又写出了两个利用多重调度的功能:

But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:

function mysquare(x::Int64)
    return(x^2);
end
function mysquare(x::Array{Int64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

这是对的吗?还是有一种更意识形态的方式来处理这种情况?我应该使用这样的类型参数吗?

Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?

function mysquare{T<:Number}(x::T)
    return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

这听起来很明智,但是我的代码会像避免使用参数类型的情况一样快地运行吗?

This feels sensible, but will my code run as quickly as the case where I avoid parametric types?

总而言之,我的问题分为两部分:

In summary, there are two parts to my question:

  1. 如果快速代码对我很重要,我应该如上所述使用参数类型,还是应该针对不同的具体类型写出多个版本?还是我应该完全做其他事情?

  1. If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?

当我想要一个既可以在数组又可以在标量上运行的函数时,是否最好写两个版本的函数,一个用于标量,一个用于数组?还是我应该完全做其他事情?

When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?

最后,请指出您在上面的代码中可能想到的其他问题,因为我的最终目标是编写良好的Julia代码.

Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.

推荐答案

Julia根据需要为每组输入编译功能的特定版本.因此要回答第1部分,没有性能差异.参数化方法是必经之路.

Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.

对于第2部分,在某些情况下最好编写一个单独的版本(有时出于性能原因,例如避免复制).但是,根据您的情况,您可以使用内置宏@vectorize_1arg自动生成阵列版本,例如:

As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro @vectorize_1arg to automatically generate the array version, e.g.:

function mysquare{T<:Number}(x::T)
    return(x^2)
end
@vectorize_1arg Number mysquare
println(mysquare([1,2,3]))

对于一般样式,请不要使用分号,并且mysquare(x::Number) = x^2会短很多.

As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.

对于向量化的mysquare,请考虑TBigFloat的情况.但是,您的输出数组为Float64.一种解决方法是将其更改为

As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to

function mysquare{T<:Number}(x::Array{T,1})
    n = length(x)
    y = Array(T, n)
    for k = 1:n
        @inbounds y[k] = x[k]^2
    end
    return y
 end

我在其中添加了@inbounds宏以提高速度,因为我们不需要每次都检查绑定违规-我们知道长度.如果x[k]^2的类型不是T,则此函数仍可能会出现问题.甚至更具防御性的版本可能是

where I've added the @inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be

function mysquare{T<:Number}(x::Array{T,1})
    n = length(x)
    y = Array(typeof(one(T)^2), n)
    for k = 1:n
        @inbounds y[k] = x[k]^2
    end
    return y
 end

其中,如果TInt,则one(T)将给出1,如果TFloat64,则1.0将给出1.0,依此类推.仅当您要制作超级健壮的库代码时,这些注意事项才有意义.如果您真的只在处理Float64或可以提升为Float64的事物,那么这不是问题.看起来似乎很努力,但是力量却是惊人的.您总是可以满足Python般的性能,而不必理会所有类型信息.

where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.

这篇关于如何写“好"字?处理多种类型和数组时的Julia代码(多种调度)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆