Python Pycryptodome AES-GCM加密代码性能改进 [英] Python Pycryptodome AES-GCM encryption code performance improvement
问题描述
我正在处理大约19G的数据,然后将其加密。我使用下面的代码来完成这项工作。
I am having around 19G of data which I am doing tar and then encrypt. I use below code to do the job.
from subprocess import call
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import sys
cmd = ["tar","--acls","--selinux","-czPf","./out.tar.gz","./src"]
proc = call(cmd)
data = open("./out.tar.gz", "rb").read()
key = get_random_bytes(32)
cipher = AES.new(key, AES.MODE_GCM)
ciphertext, tag = cipher.encrypt_and_digest(data)
out = open("./out.bin", "wb")
[out.write(x) for x in (cipher.nonce, tag, ciphertext)]
out.close()
我正在使用具有48个CPU内核和128G内存以及1800.3 GB HDD空间的HP Gen10硬件。仅一个内核的利用率接近100%,内存使用率约为43%。整个过程耗时超过一天。
我在上面的代码中寻找提高性能的方法。
I am using HP Gen10 hardware with 48 CPU cores and 128G memory and 1800.3 GB HDD space. Only one core is being utilized for almost 100% and memory usage is around 43%. The overall process is taking more than a day. I look for the ways to improve the performance in the above code.
在 SquareRootOfTwentyThree 之后,我对代码进行了重大改进。注释:
I have made significant improvements in the code after SquareRootOfTwentyThree comments:
from subprocess import call
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import StringIO
key = get_random_bytes(32)
def readLargeFile(filename):
with open(filename, "rb") as f:
while True:
data = f.read(1024)
if not data:
break
yield data
cmd = ["tar","--acls","--selinux","-czPf","./out.tar.gz","./src"]
call(cmd)
cipher = AES.new(key, AES.MODE_GCM)
ciphertext = []
for data in readLargeFile("./out.tar.gz"):
ciphertext.append(cipher.encrypt(data))
out = open("./out.bin", "wb")
[out.write(x) for x in (cipher.nonce, cipher.digest(), b"".join(ciphertext))]
out.close()
file_in = open("./out.bin", "rb")
nonce, tag, ciphertext = [file_in.read(x) for x in (16, 16, -1)]
cipher = AES.new(key, AES.MODE_GCM, nonce)
#data = cipher.decrypt_and_verify(ciphertext, tag)
data = []
for buf in StringIO.StringIO(ciphertext).read(1024):
data.append(cipher.decrypt(buf))
cipher.verify(tag)
with open("./dst/out.tar.gz", "wb") as f:
f.write(b''.join(data))
cmd = ["tar","-xzPf","./dst/out.tar.gz","-C","./dst"]
proc = call(cmd)
加密成功,但是解密的verify()导致 ValueError:MAC检查失败
注意:我正在使用PyCryptodome v3.6.6
Encrypt is successful but decrypt's verify() is causing ValueError: MAC check failed
Note: I am using PyCryptodome v3.6.6
我以某种方式成功进行了解密,以下是我的最新文章代码:
Somehow I successfully proceeded with decryption and below is my latest code:
#! /usr/bin/python
from subprocess import Popen,PIPE,call
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import StringIO,io,tarfile
import os,sys
import datetime
print "*** Encryption Starts *** " + str(datetime.datetime.now())
key = get_random_bytes(32)
def readLargeFile(filename):
with open(filename, "rb") as f:
while True:
data = f.read(1024)
if not data:
break
yield data
cmd = ["tar --acls --selinux -czPf /nfs/out.tar.gz ./encrypt_disk/src/*"]
call(cmd, shell=True)
cipher = AES.new(key, AES.MODE_GCM)
ciphertext = []
for data in readLargeFile("/nfs/out.tar.gz"):
ciphertext.append(cipher.encrypt(data))
out = open("/nfs/out.bin", "wb")
[out.write(x) for x in (cipher.nonce, cipher.digest(), b"".join(ciphertext))]
out.close()
print "*** Encryption Ends *** " + str(datetime.datetime.now())
print "*** Decryption Starts *** " + str(datetime.datetime.now())
file_in = open("/nfs/out.bin", "rb")
nonce, tag, ciphertext = [file_in.read(x) for x in (16, 16, -1)]
cipher = AES.new(key, AES.MODE_GCM, nonce)
tar = tarfile.open(fileobj=StringIO.StringIO(cipher.decrypt_and_verify(ciphertext, tag)), mode='r|*')
os.chdir("/nfs/dst")
tar.extractall(path='.')
print "*** Decryption Ends *** " + str(datetime.datetime.now())
推荐答案
GCM很难(尽管不是不可能)并行化。不过,在我3年的x86笔记本电脑(带有AESNI和CLMUL加速指令)上,使用PyCryptodome的GCM可以达到150 MB / s的速度。 19GB仅2分钟,而不是一天!我使用了以下玩具代码:
GCM is hard (though not impossible) to parallelize. Still, on my 3-year x86 laptop (with AESNI and CLMUL accelerated instructions) I do get 150 MB/s with PyCryptodome's GCM. That is only 2 minutes for 19GB, not a day! I used the following toy code:
data = os.urandom(1024*1024)
cipher = AES.new(key, AES.MODE_GCM)
for _ in range(1024):
cipher.encrypt(data)
tag = cipher.digest()
该代码不能直接用于您的用例,但是它表明您一次加密整个19GB可能存在问题。也许,您应该改为分块处理。
The code is not directly usable for your use case, but it indicates that there might be an issue with you encrypting the full 19GB at once. Perhaps, you should instead break up the processing in chunks.
其他一些评论:
- 使用探查器来确定程序在哪里花费最多的时间。可能不是您想的那样(例如
tar
步骤如何?)。 - 确保您使用的是最新版本PyCryptodome(3.6.6)的版本,因为CLMUL加速是最近才添加的。
- GCM最多只能加密256GB-您与19GB加密距离并不远。
- Use a profiler to identify where your program takes the most time. It might not be where you think it is (e.g. what about the
tar
step?). - Ensure you are using the latest version of PyCryptodome (3.6.6), since CLMUL acceleration was added only recently.
- GCM can only encrypt 256GB at most - you are not that far from that with 19GB.
这篇关于Python Pycryptodome AES-GCM加密代码性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!