未对齐的内存访问 [英] unaligned memory accesses

查看：154 发布时间：2015/11/30 23:38:59 video assembly embedded alignment decoding

本文介绍了未对齐的内存访问的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的工作不支持未对齐的内存访问的嵌入式设备上。

I'm working on an embedded device that does not support unaligned memory accesses.

对于视频去codeR我必须处理像素（每像素一个字节）在8×8像素块。该设备具有一定的SIMD处理能力，让我来并行处理4个字节。

For a video decoder I have to process pixels (one byte per pixel) in 8x8 pixel blocks. The device has some SIMD processing capabilities that allow me to work on 4 bytes in parallel.

的问题是，即8×8像素块不保证开始上对准的地址和功能需要读/写多达三个这些8×8块。

The problem is, that the 8x8 pixel blocks aren't guaranteed to start on an aligned address and the functions need to read/write up to three of these 8x8 blocks.

你会如何处理这个，如果你想很好的表现？经过一番思考，我想出了以下三种思路：

How would you approach this if you want very good performance? After a bit of thinking I came up with the following three ideas:

是否所有的内存字节访问。这是为了做到这一点，但是进展缓慢的最简单的方法，它不与SIMD capabilites正常工作（这就是我现在做我的参照C-code）。

Do all memory accesses as bytes. This is the easiest way to do it but slow and it does not work well with the SIMD capabilites (it's what I'm currently do in my reference C-code).

写四篇复制功能（每个对准的情况下），其加载通过两个32位读操作，却将位到正确的位置和写入数据的临时内存部分对准块中的像素数据。然后，视频处理功能，可以使用32位访问和SIMD。缺点：CPU将没有机会隐藏处理后面的内存延时

Write four copy-functions (one for each alignment case) that load the pixel-data via two 32-bit reads, shift the bits into the correct position and write the data to some aligned chunk of scratch memory. The video processing functions can then use 32 bit accesses and SIMD. Drawback: The CPU will have no chance to hide the memory latency behind the processing.

同样的想法与上述的，而是写入的像素以临时存储器做代替视频处理。这可能是最快的方式，但是功能的数量，我必须写这种做法是很高（约60我猜的）。

Same idea as above, but instead of writing the pixels to scratch memory do the video-processing in place. This may be the fastest way, but the number of functions that I have to write for this approach is high (around 60 I guess).

顺便说一句：我会写的所有功能汇编，因为编译器会产生可怕的code，当涉及到SIMD扩展。

Btw: I will have to write all functions in assembler because the compiler generates horrible code when it comes to the SIMD extension.

你会采取，或者你有另一个想法如何处理这哪条路？

Which road would you take, or do you have another idea how to approach this?

未对齐的内存访问 [英] unaligned memory accesses

问题描述

推荐答案

相关文章

敏捷开发最新文章

热门教程

热门工具

登录关闭

未对齐的内存访问 [英] unaligned memory accesses

问题描述

推荐答案

相关文章

敏捷开发最新文章

热门教程

热门工具

登录 关闭

登录关闭