AES 实施速度 [英] Speed of AES Implementation
问题描述
我已经编写了 AES 的 C 实现,并试图使其尽可能快(我刚刚开始编程并接受过 IT 培训).到目前为止,我已经实现了大约 600% 的速度提升,但它仍然非常缓慢.将我的 AES 实现与我使用的openssl 速度"进行比较Linux终端中的命令.在 3 秒内,此实现加密了大约 36 977 043 个块(16 字节).我比那有点糟糕的速度慢了大约 25 倍(36 字节为 72 秒).我对两件事很好奇.
- 什么是一个好的目标,一个现实的目标有多快.
- 为什么我的代码这么慢,我该如何改变.
我的代码:我试图忽略我的一些函数,所以看看没有它们代码的速度有多快.完整代码耗时 72 秒.
- 没有 Mixcolumns 14 秒#这是一个大问题
- 没有 Shiftrows 67 秒
- 没有子字节 61 秒
我的加密功能:
uint32_t * encrypt(uint32_t * expkey,uint32_t state[4]){uint32_t 温度 [4];状态[0] = 状态[0] ^ expkey[0];状态[1] = 状态[1] ^ expkey[1];状态[2] = 状态[2] ^ expkey[2];状态[3] = 状态[3] ^ expkey[3];for(int round = 1; round < Nr; round++){//子字节for (int c = 0; c <4; c++){temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);}//移动行状态[0] = ((((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) <<16) + ((((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);状态[1] = ((((temp[1] >> 24) & 0xFF) < < 24) + (((temp[2] >> 16) & 0xFF) <<16) + ((((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);状态[2] = ((((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) <<16) + ((((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);state[3] = ((((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) <<16) + ((((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);//混合列for (int c = 0; c <4; c++){状态[c] =((xtime((state[c] >> 24) & 0xFF) ^ xtime3((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8)& 0xFF) ^ (state[c] & 0xFF)) << 24) +(((((state[c] >> 24) & 0xFF) ^ xtime((state[c] >> 16) & 0xFF) ^ xtime3((state[c] >> 8)& 0xFF) ^ (state[c] & 0xFF)) << 16) +(((((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ xtime((state[c] >> 8) &; 0xFF) ^ xtime3(state[c] & 0xFF)) << 8 ) +(xtime3((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) &0xFF) ^ xtime(state[c] & 0xFF));}//添加密钥state[0] = state[0] ^ expkey[round * 4];state[1] = state[1] ^ expkey[round * 4 + 1];state[2] = state[2] ^ expkey[round * 4 + 2];state[3] = state[3] ^ expkey[round * 4 + 3];}//最后一个子字节for (int c = 0; c <4; c++){temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);}*///最后一个移行state[0] = ((((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) <<16) + ((((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);状态[1] = ((((temp[1] >> 24) & 0xFF) < < 24) + (((temp[2] >> 16) & 0xFF) <<16) + ((((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);状态[2] = ((((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) <<16) + ((((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);state[3] = ((((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) <<16) + ((((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);//最后添加键状态[0] = 状态[0] ^ expkey[Nr * 4];状态[1] = 状态[1] ^ expkey[Nr * 4 + 1];状态[2] = 状态[2] ^ expkey[Nr * 4 + 2];状态[3] = 状态[3] ^ expkey[Nr * 4 + 3];返回状态;}
还有 xtime 函数:
uint8_t xtime(uint8_t x){返回 (x << 1) ^ (0x11b & -(x >> 7));}
我期待所有提示技巧和改进.
OpenSSL 正在使用可用的 AES-NI.
openssl speed -evp aes-128-cbc
和输出
<预><代码>数字"以每秒处理的 1000 字节为单位.类型 16 字节 64 字节 256 字节 1024 字节 8192 字节 16384 字节aes-128-cbc 531549.19k 969335.21k 1045437.10k 1066826.75k 1054665.39k 1052120.41k由于您没有使用 AES-NI,因此您需要将其与软件版本进行比较
OPENSSL_ia32cap="~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc
<预><代码>数字"以每秒处理的 1000 字节为单位.类型 16 字节 64 字节 256 字节 1024 字节 8192 字节 16384 字节aes-128-cbc 143802.75k 161369.51k 165049.17k 166054.57k 166262.10k 166461.44k
如果我们比较最后一列,您会发现 AES-NI 比 OpenSSL 的软件版本快约 6.3 倍.这意味着您的速度比软件版本慢 4 倍左右.
在很多情况下,编译器优化参数也会影响速度.查看编译器的手册,如果您使用的是 GCC,那么它们是 -O[0..3]
关于代码;
如果您查看 OpenSSL 的 AES 代码,您会发现它们使用 预先计算的表格,这是一种非常常见的技术.
Subbytes
、Shiftrows
和 MixColums
变成了查表.速度差异就是这些.并不是说表查找容易受到缓存定时攻击的影响.>
I have written an C implementation of AES and have tried to make it as fast as possible (Im just starting out in Programming and have training in IT). I have achieved an Speed increase of around 600% so far but its still awfully slow. To Compare my AES-Implementation with something i have used the "openssl speed" command in the Linux-Terminal. In 3 seconds this implementation encrypts around 36 977 043 blocks (16byte). I am ~25 times slower (at 72 seconds for the 36... bytes) than that which kinda sucks. Im curious about 2 things.
- What would be a good goal to achieve, how fast is a realistic goal to aim at.
- Why is my Code so slow, and how can i change that.
To my code: I have tried to leave out on some of my functions so see how much faster the code gets without them. The full code took 72 seconds.
- Without Mixcolumns 14 seconds #here is a big problem
- Without Shiftrows 67 seconds
- Without Subbytes 61 seconds
My encryption function:
uint32_t * encrypt(uint32_t * expkey,uint32_t state[4]){
uint32_t temp[4];
state[0] = state[0] ^ expkey[0];
state[1] = state[1] ^ expkey[1];
state[2] = state[2] ^ expkey[2];
state[3] = state[3] ^ expkey[3];
for(int round = 1; round < Nr; round++){
// Subbytes
for (int c = 0; c < 4;c++){
temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);
}
// Shiftrows
state[0] = (((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) << 16) + (((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);
state[1] = (((temp[1] >> 24) & 0xFF) << 24) + (((temp[2] >> 16) & 0xFF) << 16) + (((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);
state[2] = (((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) << 16) + (((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);
state[3] = (((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) << 16) + (((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);
// Mixcolums
for (int c = 0; c < 4;c++){
state[c] =
((xtime((state[c] >> 24) & 0xFF) ^ xtime3((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) & 0xFF) ^ (state[c] & 0xFF)) << 24) +
((((state[c] >> 24) & 0xFF) ^ xtime((state[c] >> 16) & 0xFF) ^ xtime3((state[c] >> 8) & 0xFF) ^ (state[c] & 0xFF)) << 16) +
((((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ xtime((state[c] >> 8) & 0xFF) ^ xtime3(state[c] & 0xFF)) << 8 ) +
(xtime3((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) & 0xFF) ^ xtime(state[c] & 0xFF));
}
// Add Key
state[0] = state[0] ^ expkey[round * 4];
state[1] = state[1] ^ expkey[round * 4 + 1];
state[2] = state[2] ^ expkey[round * 4 + 2];
state[3] = state[3] ^ expkey[round * 4 + 3];
}
// Last Subbytes
for (int c = 0; c < 4;c++){
temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]);
}
*/
// Last Shiftrow
state[0] = (((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) << 16) + (((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF);
state[1] = (((temp[1] >> 24) & 0xFF) << 24) + (((temp[2] >> 16) & 0xFF) << 16) + (((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF);
state[2] = (((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) << 16) + (((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF);
state[3] = (((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) << 16) + (((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF);
// Last Add Key
state[0] = state[0] ^ expkey[Nr * 4];
state[1] = state[1] ^ expkey[Nr * 4 + 1];
state[2] = state[2] ^ expkey[Nr * 4 + 2];
state[3] = state[3] ^ expkey[Nr * 4 + 3];
return state;
}
And the xtime function:
uint8_t xtime(uint8_t x){
return (x << 1) ^ (0x11b & -(x >> 7));
}
I am looking forward to all tips tricks and improvements.
The OpenSSL is using the AES-NI where available.
openssl speed -evp aes-128-cbc
and outputs
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 531549.19k 969335.21k 1045437.10k 1066826.75k 1054665.39k 1052120.41k
Since you are not using the AES-NI you need to compare it with the software version
OPENSSL_ia32cap="~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 143802.75k 161369.51k 165049.17k 166054.57k 166262.10k 166461.44k
if we compare the last column, you will see that AES-NI is ~6.3 times faster than the software version of the OpenSSL. This means that you are around 4 times slower than the software version.
In many cases, the compiler optimization parameters can also affect the speed, too. Look into the manual of your compiler, if you are using GCC then they are -O[0..3]
About the code;
If you look at the AES code of OpenSSL you will see that they use pre-computed tables and this is a very common technique.
The Subytes
, Shiftrows
and MixColums
are turned into table lookup. The speed difference is these. And not that the table lookup is vulnerable to cache-timing attacks.
这篇关于AES 实施速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!