AES 实施速度 [英] Speed of AES Implementation

查看:17
本文介绍了AES 实施速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了 AES 的 C 实现,并试图使其尽可能快(我刚刚开始编程并接受过 IT 培训).到目前为止,我已经实现了大约 600% 的速度提升,但它仍然非常缓慢.将我的 AES 实现与我使用的openssl 速度"进行比较Linux终端中的命令.在 3 秒内,此实现加密了大约 36 977 043 个块(16 字节).我比那有点糟糕的速度慢了大约 25 倍(36 字节为 72 秒).我对两件事很好奇.

  1. 什么是一个好的目标,一个现实的目标有多快.
  2. 为什么我的代码这么慢,我该如何改变.

我的代码:我试图忽略我的一些函数,所以看看没有它们代码的速度有多快.完整代码耗时 72 秒.

  • 没有 Mixcolumns 14 秒#这是一个大问题
  • 没有 Shiftrows 67 秒
  • 没有子字节 61 秒

我的加密功能:

uint32_t * encrypt(uint32_t * expkey,uint32_t state[4]){uint32_t 温度 [4];状态[0] = 状态[0] ^ expkey[0];状态[1] = 状态[1] ^ expkey[1];状态[2] = 状态[2] ^ expkey[2];状态[3] = 状态[3] ^ expkey[3];for(int round = 1; round < Nr; round++){//子字节for (int c = 0; c <4; c++){temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);}//移动行状态[0] = ((((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) <<16) + ((((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);状态[1] = ((((temp[1] >> 24) & 0xFF) < < 24) + (((temp[2] >> 16) & 0xFF) <<16) + ((((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);状态[2] = ((((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) <<16) + ((((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);state[3] = ((((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) <<16) + ((((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);//混合列for (int c = 0; c <4; c++){状态[c] =((xtime((state[c] >> 24) & 0xFF) ^ xtime3((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8)& 0xFF) ^ (state[c] & 0xFF)) << 24) +(((((state[c] >> 24) & 0xFF) ^ xtime((state[c] >> 16) & 0xFF) ^ xtime3((state[c] >> 8)& 0xFF) ^ (state[c] & 0xFF)) << 16) +(((((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ xtime((state[c] >> 8) &; 0xFF) ^ xtime3(state[c] & 0xFF)) << 8 ) +(xtime3((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) &0xFF) ^ xtime(state[c] & 0xFF));}//添加密钥state[0] = state[0] ^ expkey[round * 4];state[1] = state[1] ^ expkey[round * 4 + 1];state[2] = state[2] ^ expkey[round * 4 + 2];state[3] = state[3] ^ expkey[round * 4 + 3];}//最后一个子字节for (int c = 0; c <4; c++){temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);}*///最后一个移行state[0] = ((((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) <<16) + ((((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);状态[1] = ((((temp[1] >> 24) & 0xFF) < < 24) + (((temp[2] >> 16) & 0xFF) <<16) + ((((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);状态[2] = ((((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) <<16) + ((((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);state[3] = ((((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) <<16) + ((((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);//最后添加键状态[0] = 状态[0] ^ expkey[Nr * 4];状态[1] = 状态[1] ^ expkey[Nr * 4 + 1];状态[2] = 状态[2] ^ expkey[Nr * 4 + 2];状态[3] = 状态[3] ^ expkey[Nr * 4 + 3];返回状态;}

还有 xtime 函数:

uint8_t xtime(uint8_t x){返回 (x << 1) ^ (0x11b & -(x >> 7));}

我期待所有提示技巧和改进.

解决方案

OpenSSL 正在使用可用的 AES-NI.

openssl speed -evp aes-128-cbc

和输出

<预><代码>数字"以每秒处理的 1000 字节为单位.类型 16 字节 64 字节 256 字节 1024 字节 8192 字节 16384 字节aes-128-cbc 531549.19k 969335.21k 1045437.10k 1066826.75k 1054665.39k 1052120.41k

由于您没有使用 AES-NI,因此您需要将其与软件版本进行比较

 OPENSSL_ia32cap="~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc

<预><代码>数字"以每秒处理的 1000 字节为单位.类型 16 字节 64 字节 256 字节 1024 字节 8192 字节 16384 字节aes-128-cbc 143802.75k 161369.51k 165049.17k 166054.57k 166262.10k 166461.44k

如果我们比较最后一列,您会发现 AES-NI 比 OpenSSL 的软件版本快约 6.3 倍.这意味着您的速度比软件版本慢 4 倍左右.

在很多情况下,编译器优化参数也会影响速度.查看编译器的手册,如果您使用的是 GCC,那么它们是 -O[0..3]


关于代码;

如果您查看 OpenSSL 的 AES 代码,您会发现它们使用 预先计算的表格,这是一种非常常见的技术.

SubbytesShiftrowsMixColums 变成了查表.速度差异就是这些.并不是说表查找容易受到缓存定时攻击的影响.>

I have written an C implementation of AES and have tried to make it as fast as possible (Im just starting out in Programming and have training in IT). I have achieved an Speed increase of around 600% so far but its still awfully slow. To Compare my AES-Implementation with something i have used the "openssl speed" command in the Linux-Terminal. In 3 seconds this implementation encrypts around 36 977 043 blocks (16byte). I am ~25 times slower (at 72 seconds for the 36... bytes) than that which kinda sucks. Im curious about 2 things.

  1. What would be a good goal to achieve, how fast is a realistic goal to aim at.
  2. Why is my Code so slow, and how can i change that.

To my code: I have tried to leave out on some of my functions so see how much faster the code gets without them. The full code took 72 seconds.

  • Without Mixcolumns 14 seconds #here is a big problem
  • Without Shiftrows 67 seconds
  • Without Subbytes 61 seconds

My encryption function:

uint32_t * encrypt(uint32_t * expkey,uint32_t state[4]){

    uint32_t temp[4];

    state[0] = state[0] ^ expkey[0];
    state[1] = state[1] ^ expkey[1];
    state[2] = state[2] ^ expkey[2];
    state[3] = state[3] ^ expkey[3];
    
    for(int round = 1; round < Nr; round++){
        
        // Subbytes
        for (int c = 0; c < 4;c++){
            temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);
        }
        // Shiftrows
        state[0] = (((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) << 16) + (((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);
        state[1] = (((temp[1] >> 24) & 0xFF) << 24) + (((temp[2] >> 16) & 0xFF) << 16) + (((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);
        state[2] = (((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) << 16) + (((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);
        state[3] = (((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) << 16) + (((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);
        
        // Mixcolums
        for (int c = 0; c < 4;c++){
            state[c] = 
                ((xtime((state[c] >> 24) & 0xFF) ^ xtime3((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) & 0xFF) ^ (state[c] & 0xFF)) << 24) +
                ((((state[c] >> 24) & 0xFF) ^ xtime((state[c] >> 16) & 0xFF) ^ xtime3((state[c] >> 8) & 0xFF) ^ (state[c] & 0xFF)) << 16) + 
                ((((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ xtime((state[c] >> 8) & 0xFF) ^ xtime3(state[c] & 0xFF)) << 8 ) +
                (xtime3((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) & 0xFF) ^ xtime(state[c] & 0xFF));       
        
        }
        // Add Key
        state[0] = state[0] ^ expkey[round * 4];
        state[1] = state[1] ^ expkey[round * 4 + 1];
        state[2] = state[2] ^ expkey[round * 4 + 2];
        state[3] = state[3] ^ expkey[round * 4 + 3];
        
        }
        // Last Subbytes
        for (int c = 0; c < 4;c++){
            temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);
        }
        */
        // Last Shiftrow
        state[0] = (((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) << 16) + (((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);
        state[1] = (((temp[1] >> 24) & 0xFF) << 24) + (((temp[2] >> 16) & 0xFF) << 16) + (((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);
        state[2] = (((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) << 16) + (((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);
        state[3] = (((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) << 16) + (((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);
        
        // Last Add Key
        state[0] = state[0] ^ expkey[Nr * 4];
        state[1] = state[1] ^ expkey[Nr * 4 + 1];
        state[2] = state[2] ^ expkey[Nr * 4 + 2];
        state[3] = state[3] ^ expkey[Nr * 4 + 3];
        
        return state;
}

And the xtime function:

uint8_t xtime(uint8_t x){
    return (x << 1) ^ (0x11b & -(x >> 7));
}

I am looking forward to all tips tricks and improvements.

解决方案

The OpenSSL is using the AES-NI where available.

openssl speed -evp aes-128-cbc

and outputs


The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc    531549.19k   969335.21k  1045437.10k  1066826.75k  1054665.39k  1052120.41k

Since you are not using the AES-NI you need to compare it with the software version

 OPENSSL_ia32cap="~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc


The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc    143802.75k   161369.51k   165049.17k   166054.57k   166262.10k   166461.44k

if we compare the last column, you will see that AES-NI is ~6.3 times faster than the software version of the OpenSSL. This means that you are around 4 times slower than the software version.

In many cases, the compiler optimization parameters can also affect the speed, too. Look into the manual of your compiler, if you are using GCC then they are -O[0..3]


About the code;

If you look at the AES code of OpenSSL you will see that they use pre-computed tables and this is a very common technique.

The Subytes, Shiftrows and MixColums are turned into table lookup. The speed difference is these. And not that the table lookup is vulnerable to cache-timing attacks.

这篇关于AES 实施速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆