英特尔处理器上未对齐的访问存储 [英] unaligned access store on intel processor

查看:83
本文介绍了英特尔处理器上未对齐的访问存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下示例.当出现以下情况时,它会在标记的行上使用gcc 5.4进行分段 我用g++ -O3 -std=c++11编译它.它在指令movaps处失败,我怀疑它执行未对齐的内存访问.是gcc会为如此简单的示例生成非法代码,还是我遗漏了某些东西? 我正在Intel i5-5200U上运行它.

Consider the sample below. It segfaults with gcc 5.4 at the marked line when I compile it with g++ -O3 -std=c++11. It fails at instruction movaps and I suspect it performs unaligned memory access. Could it be that gcc generates illegal code for such a simple sample or I am missing something? I am running it on Intel i5-5200U.

#include <vector>
#include <memory>
#include <cstdint>

using namespace std;

__attribute__ ((noinline))
void SerializeTo(const vector<uint64_t>& v, uint8_t* dest) {
  for (size_t i = 0; i < v.size(); ++i) {
    *reinterpret_cast<uint64_t*>(dest) = v[i];  // Segfaults here.
    dest += sizeof(uint64_t);
  }
}

int main() {
 std::vector<uint64_t> d(64);

 unique_ptr<uint8_t[]> tmp(new uint8_t[1024]);

 SerializeTo(d, tmp.get() + 6);

 return 0;
}

推荐答案

在c ++中合法执行类型修剪的方法很少.

There are very few ways to perform type punning legally in c++.

魔术功能std::memcpy是这里的选择工具:

The magic function std::memcpy is the tool of choice here:

__attribute__ ((noinline))
void SerializeTo(const vector<uint64_t>& v, uint8_t* dest) {
  for (size_t i = 0; i < v.size(); ++i) {
      std::memcpy(dest, std::addressof(v[i]), sizeof(v[i]));
    dest += sizeof(uint64_t);
  }
}

使用-std=c++11 -O3 -march=native -Wall -pedantic

SerializeTo(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned char*):   # @SerializeTo(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned char*)
        mov     rax, qword ptr [rdi]
        cmp     qword ptr [rdi + 8], rax
        je      .LBB0_3
        xor     ecx, ecx
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        mov     rax, qword ptr [rax + 8*rcx]
        mov     qword ptr [rsi + 8*rcx], rax
        add     rcx, 1
        mov     rax, qword ptr [rdi]
        mov     rdx, qword ptr [rdi + 8]
        sub     rdx, rax
        sar     rdx, 3
        cmp     rcx, rdx
        jb      .LBB0_2
.LBB0_3:
        ret

https://godbolt.org/g/ReGA9N

这篇关于英特尔处理器上未对齐的访问存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆