如何编写内联汇编codeS关于X $ C $çLLVM循环? [英] how to write inline assembly codes about LOOP in Xcode LLVM?

查看:337
本文介绍了如何编写内联汇编codeS关于X $ C $çLLVM循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习有关内联汇编。我想在X code 4 LLVM编译器3.0编写的iPhone一个简单的程序。我成功编写基本的内联汇编codeS。

例如:

  INT子(INT A,INT B)
{
    INT℃;
    ASM(亚%0%1%2:= R(三):R(一),R(B));
    返回℃;
}

我发现它在stackoverflow.com和它工作得很好。但是,我不知道该怎么写code约循环。

我需要装配codeS像

 无效提亮(无符号字符* SRC,无符号字符* DST,诠释numPixels,INT强度)
{
    的for(int i = 0; I< numPixels;我++)
    {
        DST由[i] = SRC [I] +强度;
    }
}


解决方案

在循环部分看看这里 - 的 http://en.wikipedia.org/wiki/ARM_architecture

基本上,你会想是这样的:

 无效提亮(无符号字符* SRC,无符号字符* DST,诠释numPixels,诠释强度){
    ASM挥发性(
                  \\ t MOV R3,#0 \\ n
                  LLOOP:\\ N
                  \\ t CMP R3,%2 \\ n
                  \\ t BGE借给\\ n
                  \\ t LDRB R4,[%0,R3] \\ n
                  \\ t增加R4,R4,%3 \\ n
                  \\ t STRB R4,[%1,R3] \\ n
                  \\ t增加R3,R3,#1 \\ n
                  \\ T B LLOOP \\ n
                  借:\\ n
                 := R(SRC),= R(DST),= R(numPixels),= R(强度)
                 :0(SRC),1(DST),2(numPixels),3(强度)
                 :CC,R3,R4);
}

更新:

及这里的NEON版本:

 无效brighten_neon(无符号字符* SRC,无符号字符* DST,诠释numPixels,诠释强度){
    ASM挥发性(
                  \\ t MOV R4,#0 \\ n
                  \\ t vdup.8 D1,%3 \\ n
                  Lloop2:\\ n
                  \\ t CMP R4,%2 \\ n
                  \\ t BGE Lend2 \\ n
                  \\ t vld1.8 D0,[%0]!\\ n
                  \\ t vqadd.s8 D0,D0,D1 \\ n
                  \\ t vst1.8 D0,[%1]!\\ n
                  \\ t增加R4,R4,#8 \\ n
                  \\ T B Lloop2 \\ n
                  Lend2:\\ N
                  := R(SRC),= R(DST),= R(numPixels),= R(强度)
                  :0(SRC),1(DST),2(numPixels),3(强度)
                  :CC,R4,D1,D0);
}

所以这个NEON版本将做一次8。然而,它并不检查 numPixels 是被8整除,所以你肯定会想这样做,否则事情会出问题!无论如何,它只是在向你展示什么可以做一个开始。注意相同数量的指令,但行动的 8 的数据一次像素。哦,和它有在那里的饱和,以及我假设你想。

I'm studying about inline assembly. I want to write a simple routine in iPhone under Xcode 4 LLVM 3.0 Compiler. I succeed write basic inline assembly codes.

example :

int sub(int a, int b)
{
    int c;
    asm ("sub %0, %1, %2" : "=r" (c) : "r" (a), "r" (b));
    return c;
}

I found it in stackoverflow.com and it works very well. But, I don't know how to write code about LOOP.

I need to assembly codes like

void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity)
{
    for(int i=0; i<numPixels; i++)
    {
        dst[i] = src[i] + intensity;
    }
}

解决方案

Take a look here at the loop section - http://en.wikipedia.org/wiki/ARM_architecture

Basically you'll want something like:

void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
    asm volatile (
                  "\t mov r3, #0\n"
                  "Lloop:\n"
                  "\t cmp r3, %2\n"
                  "\t bge Lend\n"
                  "\t ldrb r4, [%0, r3]\n"
                  "\t add r4, r4, %3\n"
                  "\t strb r4, [%1, r3]\n"
                  "\t add r3, r3, #1\n"
                  "\t b Lloop\n"
                  "Lend:\n"
                 : "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
                 : "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
                 : "cc", "r3", "r4");
}

Update:

And here's that NEON version:

void brighten_neon(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
    asm volatile (
                  "\t mov r4, #0\n"
                  "\t vdup.8 d1, %3\n"
                  "Lloop2:\n"
                  "\t cmp r4, %2\n"
                  "\t bge Lend2\n"
                  "\t vld1.8 d0, [%0]!\n"
                  "\t vqadd.s8 d0, d0, d1\n"
                  "\t vst1.8 d0, [%1]!\n"
                  "\t add r4, r4, #8\n"
                  "\t b Lloop2\n"
                  "Lend2:\n"
                  : "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
                  : "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
                  : "cc", "r4", "d1", "d0");
}

So this NEON version will do 8 at a time. It does however not check that numPixels is divisible by 8 so you'd definitely want to do that otherwise things will go wrong! Anyway, it's just a start at showing you what can be done. Notice the same number of instructions, but action on eight pixels of data at once. Oh and it's got the saturation in there as well that I assume you would want.

这篇关于如何编写内联汇编codeS关于X $ C $çLLVM循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆