打开一大块内存倒退,快 [英] Turn a large chunk of memory backwards, fast

查看:78
本文介绍了打开一大块内存倒退,快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要重写大约以相反的顺序4KB的数据,在比特级别(最后一个字节的最后一位成为第一字节的第一位),尽可能快。是否有任何聪明的sniplets办呢?

I need to rewrite about 4KB of data in reverse order, at bit level (last bit of last byte becoming first bit of first byte), as fast as possible. Are there any clever sniplets to do it?

理由:这些数据是在通常被定位成嵌入式设备的液晶屏的显示内容在屏幕上你的肩膀水平。屏幕上有6点的方向,即从下面看 - 喜欢平躺着或上面挂着你的眼睛水平。这是通过旋转屏幕180度可以解决的,但后来我需要扭转画面数据(由库生成),这是1位= 1个像素,从屏幕的左上角。 CPU是不是很强大,而且该器件具有足够的工作已经,再加上几帧每秒希望这样的表现是一个问题; RAM没有这么多。

Rationale: The data is display contents of LCD screen in an embedded device that is usually positioned in a way that the screen is on your shoulders level. The screen has "6 o'clock" orientation, that is to be viewed from below - like lying flat or hanging above your eyes level. This is fixable by rotating the screen 180 degrees, but then I need to reverse the screen data (generated by library), which is 1 bit = 1 pixel, starting with upper left of the screen. The CPU isn't very powerful, and the device has enough work already, plus several frames a second would be desirable so performance is an issue; RAM not so much.

编辑:
单核,ARM 9系列。 64MB,(稍后缩小到32MB),Linux操作系统。数据从系统存储器被推到LCD驱动器上的8位的IO端口

edit: Single core, ARM 9 series. 64MB, (to be scaled down to 32MB later), Linux. The data is pushed from system memory to the LCD driver over 8-bit IO port.

CPU是32位,并执行这个字的大小比在字节级的要好得多。

The CPU is 32bit and performs much better at this word size than at byte level.

推荐答案

有做这方面的经典方法。比方说,unsigned int类型是32位字。我使用的是C99,因为严格的关键词让编译器在这个速度的关键code,否则将无法使用执行额外的优化。这些关键字通知SRC和目标不重叠的编译器。这也假设你要复制的话的整数倍,如果你不是,那么这仅仅是一个开始。

There's a classic way to do this. Let's say unsigned int is your 32-bit word. I'm using C99 because the restrict keyword lets the compiler perform extra optimizations in this speed-critical code that would otherwise be unavailable. These keywords inform the compiler that "src" and "dest" do not overlap. This also assumes you are copying an integral number of words, if you're not, then this is just a start.

我也不知道哪个位移动/旋转元是对ARM快,这是缓慢的。这是值得考虑的。如果你需要更快的速度,考虑拆卸来自C编译器的输出,并从那里去。如果使用GCC,尝试O2,O3和OS,看看哪一个是最快的。你可能会在同一时间做两句话减少流水线停顿。

I also don't know which bit shifting / rotation primitives are fast on the ARM and which are slow. This is something to consider. If you need more speed, consider disassembling the output from the C compiler and going from there. If using GCC, try O2, O3, and Os to see which one is fastest. You might reduce stalls in the pipeline by doing two words at the same time.

本采用23操作每个字,不包括加载和存储。然而,这23个操作都非常快,他们没有存取存储器。我不知道,如果一个查找表会更快与否。

This uses 23 operations per word, not counting load and store. However, these 23 operations are all very fast and none of them access memory. I don't know if a lookup table would be faster or not.

void
copy_rev(unsigned int *restrict dest,
         unsigned int const *restrict src,
         unsigned int n)
{
    unsigned int i, x;
    for (i = 0; i < n; ++i) {
        x = src[i];
        x = (x >> 16) | (x << 16);
        x = ((x >> 8) & 0x00ff00ffU) | ((x & 0x00ff00ffU) << 8);
        x = ((x >> 4) & 0x0f0f0f0fU) | ((x & 0x0f0f0f0fU) << 4);
        x = ((x >> 2) & 0x33333333U) | ((x & 0x33333333U) << 2);
        x = ((x >> 1) & 0x55555555U) | ((x & 0x555555555) << 1);
        dest[n-1-i] = x;
    }
}

此页面是一个很好的参考:<一href=\"http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious\">http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious

This page is a great reference: http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious

最后说明:纵观ARM汇编参考,有一个REVOP code它将在字中的字节顺序。这将剃每个环路7操作在离上述code。

Final note: Looking at the ARM assembly reference, there is a "REV" opcode which reverses the byte order in a word. This would shave 7 operations per loop off the above code.

这篇关于打开一大块内存倒退,快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆