Fortran中的高效z阶转换 [英] Efficient z-order transformation in Fortran

查看:132
本文介绍了Fortran中的高效z阶转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我目前在网格生成算法上的工作,我需要一种有效的方法将三维坐标转换为z阶(更精确地说:将三个4字节整数转换为一个8字节整数),然后进行另一种转换.这篇维基百科文章对此进行了很好的描述: Z阶曲线. 由于我不是程序员,因此我想出的解决方案可以实现预期的功能,但使用固有的mvbits显式地进行位交织可能会很幼稚:

SUBROUTINE pos_to_z(i, j, k, zval)

use types

INTEGER(I4B), INTENT(IN)  :: i, j, k
INTEGER(I8B), INTENT(OUT) :: zval
INTEGER(I8B) :: i8, j8, k8
INTEGER(I4B) :: b

zval = 0
i8 = i-1
j8 = j-1
k8 = k-1

do b=0, 19
    call mvbits(i8,b,1,zval,3*b+2)
    call mvbits(j8,b,1,zval,3*b+1)
    call mvbits(k8,b,1,zval,3*b  )
end do

zval = zval+1

END SUBROUTINE pos_to_z


SUBROUTINE z_to_pos(zval, i, j, k)

use types

INTEGER(I8B), INTENT(IN)  :: zval
INTEGER(I4B), INTENT(OUT) :: i, j, k
INTEGER(I8B) :: i8, j8, k8, z_order
INTEGER(I4B) :: b

z_order = zval-1
i8 = 0
j8 = 0
k8 = 0

do b=0, 19
    call mvbits(z_order,3*b+2,1,i8,b)
    call mvbits(z_order,3*b+1,1,j8,b)
    call mvbits(z_order,3*b  ,1,k8,b)
end do

i = int(i8,kind=I4B) + 1
j = int(j8,kind=I4B) + 1
k = int(k8,kind=I4B) + 1

END SUBROUTINE z_to_pos

请注意,我更喜欢输入和输出范围以1而不是0开头,这导致了一些额外的计算. 事实证明,此实现相当慢.我测量了转换和重新转换10 ^ 7个位置所需的时间:
gfortran -O0:6.2340秒
gfortran -O3:5.1564秒
ifort -O0:4.2058秒
ifort -O3:0.9793秒

我还尝试了gfortran的不同优化选项,但均未成功.尽管使用ifort进行优化的代码已经快了很多,但这仍然是我程序的瓶颈. 如果有人可以指出正确的方向,如何在Fortran中更有效地进行位交织,那将真的很有帮助.

解决方案

可以使用类似于上述

subroutine pos_to_z(i, j, k, zval, morton_table)
    integer, intent(in) :: i, j, k
    integer(kind=8), dimension (0:1023), intent (in) :: morton_table
    integer(kind=8), intent (out) :: zval
    integer(kind=8) :: z, i8, j8, k8

    i8 = i-1
    j8 = j-1
    k8 = k-1

    z = morton_table(iand(k8, 1023))
    z = z + ishft(morton_table(iand(j8, 1023)),1)
    z = z + ishft(morton_table(iand(i8, 1023)),2)
    z = z + ishft(morton_table(iand(ishft(k8,-10), 1023)),30)
    z = z + ishft(morton_table(iand(ishft(j8,-10), 1023)),31)
    zval = z + ishft(morton_table(iand(ishft(i8,-10), 1023)),32) + 1

end subroutine pos_to_z

您可以使用类似的方法来替代,但是我认为效率不高.创建一个包含32768个值(15位)的查找表,其中存储了5位重构的输入值.您将必须执行12次查找,一次为三个20位值中的每一个获取5位.屏蔽掉底部的15位,然后右移0、1和2位以获取k,j和i的查找索引.然后移位并屏蔽以获取15-29、30-44和45-59位,并且每次都执行相同操作,然后进行移位和加法运算以重建k,j和i.

For my current work on a grid generation algorithm I need an efficient way to transform three-dimensional coordinates to z-order (more precisely: three 4-Byte integers into one 8-Byte integer) and the other way round. This Wikipedia article describes it fairly well: Z-order curve. Since I am not a programmer, the solution I came up with does what it is supposed to do but might be quite naive using the mvbits intrinsic to do the bit interleaving explicitly:

SUBROUTINE pos_to_z(i, j, k, zval)

use types

INTEGER(I4B), INTENT(IN)  :: i, j, k
INTEGER(I8B), INTENT(OUT) :: zval
INTEGER(I8B) :: i8, j8, k8
INTEGER(I4B) :: b

zval = 0
i8 = i-1
j8 = j-1
k8 = k-1

do b=0, 19
    call mvbits(i8,b,1,zval,3*b+2)
    call mvbits(j8,b,1,zval,3*b+1)
    call mvbits(k8,b,1,zval,3*b  )
end do

zval = zval+1

END SUBROUTINE pos_to_z


SUBROUTINE z_to_pos(zval, i, j, k)

use types

INTEGER(I8B), INTENT(IN)  :: zval
INTEGER(I4B), INTENT(OUT) :: i, j, k
INTEGER(I8B) :: i8, j8, k8, z_order
INTEGER(I4B) :: b

z_order = zval-1
i8 = 0
j8 = 0
k8 = 0

do b=0, 19
    call mvbits(z_order,3*b+2,1,i8,b)
    call mvbits(z_order,3*b+1,1,j8,b)
    call mvbits(z_order,3*b  ,1,k8,b)
end do

i = int(i8,kind=I4B) + 1
j = int(j8,kind=I4B) + 1
k = int(k8,kind=I4B) + 1

END SUBROUTINE z_to_pos

Note that I prefer the input and output ranges to start with 1 instead of 0, leading to some additional calculations. As it turns out, this implementation is rather slow. I measured the time it takes to transform and re-transform 10^7 positions:
gfortran -O0: 6.2340 seconds
gfortran -O3: 5.1564 seconds
ifort -O0: 4.2058 seconds
ifort -O3: 0.9793 seconds

I also tried different optimization options for gfortran without success. While the optimized code with ifort is already a lot faster, it is still the bottleneck of my program. It would be really helpful if someone could point me into the right direction how to do the bit interleaving more efficiently in Fortran.

解决方案

Transforming from 3 co-ords to a z-order can be optimized using a look up table similar to the one described here. Since you only use 20 bits of your input values, it would be more efficient to use a look-up table with 1024 entries rather than 256, enough to index 10 bits, so that you only have to do 2 look-ups for each of your 3 input values, and modified for the case of interleaving 3 values instead of 2.

Entry n of the array stores the integer n, with its bits spread out so that bit 0 is in bit 0, bit 1 is moved to bit 3, bit 2 is moved to bit 6 and so on, with all the remaining bits set to zero. The lookup table array can be initialized like this:

subroutine init_morton_table(morton_table)
    integer(kind=8), dimension (0:1023), intent (out) :: morton_table
    integer :: b, v, z
    do v=0, 1023
        z = 0
        do b=0, 9
            call mvbits(v,b,1,z,3*b)
        end do
        morton_table(v) = z
    end do
end subroutine init_morton_table

To actually interleave the values, separate your 3 input values into their low 10 bits and their high 10 bits, then use these 6 values as indices into the array and combine the looked-up values using shifts and adds to interleave the values together. Adds are equivalent to bitwise OR operations in this case because there won't be any carries given that at most one bit will be set in each bit position. Because only every 3rd bit can be set in the values in the tables, offsetting one of the values by 1 bit and another by 2 means there won't be any collisions.

subroutine pos_to_z(i, j, k, zval, morton_table)
    integer, intent(in) :: i, j, k
    integer(kind=8), dimension (0:1023), intent (in) :: morton_table
    integer(kind=8), intent (out) :: zval
    integer(kind=8) :: z, i8, j8, k8

    i8 = i-1
    j8 = j-1
    k8 = k-1

    z = morton_table(iand(k8, 1023))
    z = z + ishft(morton_table(iand(j8, 1023)),1)
    z = z + ishft(morton_table(iand(i8, 1023)),2)
    z = z + ishft(morton_table(iand(ishft(k8,-10), 1023)),30)
    z = z + ishft(morton_table(iand(ishft(j8,-10), 1023)),31)
    zval = z + ishft(morton_table(iand(ishft(i8,-10), 1023)),32) + 1

end subroutine pos_to_z

You can use a similar technique to go the other way, but I don't think it will be quite as efficient. Create a lookup table of 32768 values (15 bits) that store 5 bits of the reconstructed input value. You will have to do 12 look-ups, getting 5 bits at a time for each of your three 20-bit values. Mask off the bottom 15 bits, then shift right 0, 1 and 2 bits to get your look-up indexes for k, j and i. Then shift and mask to get bits 15-29, 30-44 and 45-59 and do the same each time, shifting and adding to reconstruct k, j and i.

这篇关于Fortran中的高效z阶转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆