如何在 Xcode LLVM 中编写关于 LOOP 的内联汇编代码? [英] how to write inline assembly codes about LOOP in Xcode LLVM?

查看:27
本文介绍了如何在 Xcode LLVM 中编写关于 LOOP 的内联汇编代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究内联汇编.我想在 Xcode 4 LLVM 3.0 Compiler 下的 iPhone 中编写一个简单的例程.我成功地编写了基本的内联汇编代码.

I'm studying about inline assembly. I want to write a simple routine in iPhone under Xcode 4 LLVM 3.0 Compiler. I succeed write basic inline assembly codes.

示例:

int sub(int a, int b)
{
    int c;
    asm ("sub %0, %1, %2" : "=r" (c) : "r" (a), "r" (b));
    return c;
}

我在 stackoverflow.com 上找到了它,而且效果很好.但是,我不知道如何编写有关 LOOP 的代码.

I found it in stackoverflow.com and it works very well. But, I don't know how to write code about LOOP.

我需要像这样的汇编代码

I need to assembly codes like

void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity)
{
    for(int i=0; i<numPixels; i++)
    {
        dst[i] = src[i] + intensity;
    }
}

推荐答案

看看这里的循环部分 - http://en.wikipedia.org/wiki/ARM_architecture

Take a look here at the loop section - http://en.wikipedia.org/wiki/ARM_architecture

基本上你会想要这样的东西:

Basically you'll want something like:

void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
    asm volatile (
                  "\t mov r3, #0\n"
                  "Lloop:\n"
                  "\t cmp r3, %2\n"
                  "\t bge Lend\n"
                  "\t ldrb r4, [%0, r3]\n"
                  "\t add r4, r4, %3\n"
                  "\t strb r4, [%1, r3]\n"
                  "\t add r3, r3, #1\n"
                  "\t b Lloop\n"
                  "Lend:\n"
                 : "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
                 : "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
                 : "cc", "r3", "r4");
}

更新:

这是 NEON 版本:

And here's that NEON version:

void brighten_neon(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
    asm volatile (
                  "\t mov r4, #0\n"
                  "\t vdup.8 d1, %3\n"
                  "Lloop2:\n"
                  "\t cmp r4, %2\n"
                  "\t bge Lend2\n"
                  "\t vld1.8 d0, [%0]!\n"
                  "\t vqadd.s8 d0, d0, d1\n"
                  "\t vst1.8 d0, [%1]!\n"
                  "\t add r4, r4, #8\n"
                  "\t b Lloop2\n"
                  "Lend2:\n"
                  : "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
                  : "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
                  : "cc", "r4", "d1", "d0");
}

所以这个 NEON 版本一次会做 8 个.但是它不会检查 numPixels 是否可以被 8 整除,所以你肯定想要这样做,否则事情会出错!无论如何,这只是向您展示可以做什么的开始.请注意相同数量的指令,但一次对 8 像素数据进行操作.哦,它还有饱和度,我想你会想要的.

So this NEON version will do 8 at a time. It does however not check that numPixels is divisible by 8 so you'd definitely want to do that otherwise things will go wrong! Anyway, it's just a start at showing you what can be done. Notice the same number of instructions, but action on eight pixels of data at once. Oh and it's got the saturation in there as well that I assume you would want.

这篇关于如何在 Xcode LLVM 中编写关于 LOOP 的内联汇编代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆