Python Pycryptodome AES-GCM加密代码性能改进 [英] Python Pycryptodome AES-GCM encryption code performance improvement

查看:439
本文介绍了Python Pycryptodome AES-GCM加密代码性能改进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大约19G的数据,然后将其加密。我使用下面的代码来完成这项工作。

I am having around 19G of data which I am doing tar and then encrypt. I use below code to do the job.

from subprocess import call
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import sys

cmd = ["tar","--acls","--selinux","-czPf","./out.tar.gz","./src"]
proc = call(cmd)
data = open("./out.tar.gz", "rb").read()
key = get_random_bytes(32)
cipher = AES.new(key, AES.MODE_GCM)
ciphertext, tag = cipher.encrypt_and_digest(data)

out = open("./out.bin", "wb")
[out.write(x) for x in (cipher.nonce, tag, ciphertext)]
out.close()

我正在使用具有48个CPU内核和128G内存以及1800.3 GB HDD空间的HP Gen10硬件。仅一个内核的利用率接近100%,内存使用率约为43%。整个过程耗时超过一天。
我在上面的代码中寻找提高性能的方法。

I am using HP Gen10 hardware with 48 CPU cores and 128G memory and 1800.3 GB HDD space. Only one core is being utilized for almost 100% and memory usage is around 43%. The overall process is taking more than a day. I look for the ways to improve the performance in the above code.

SquareRootOfTwentyThree 之后,我对代码进行了重大改进。注释:

I have made significant improvements in the code after SquareRootOfTwentyThree comments:

from subprocess import call
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import StringIO

key = get_random_bytes(32)

def readLargeFile(filename):
    with open(filename, "rb") as f:
        while True:
            data = f.read(1024)
            if not data:
                break
            yield data

cmd = ["tar","--acls","--selinux","-czPf","./out.tar.gz","./src"]
call(cmd)

cipher = AES.new(key, AES.MODE_GCM)
ciphertext = []

for data in readLargeFile("./out.tar.gz"):
    ciphertext.append(cipher.encrypt(data))

out = open("./out.bin", "wb")
[out.write(x) for x in (cipher.nonce, cipher.digest(), b"".join(ciphertext))]
out.close()

file_in = open("./out.bin", "rb")
nonce, tag, ciphertext = [file_in.read(x) for x in (16, 16, -1)]
cipher = AES.new(key, AES.MODE_GCM, nonce)
#data = cipher.decrypt_and_verify(ciphertext, tag)
data = []
for buf in StringIO.StringIO(ciphertext).read(1024):
    data.append(cipher.decrypt(buf))
cipher.verify(tag)
with open("./dst/out.tar.gz", "wb") as f:
    f.write(b''.join(data))
cmd = ["tar","-xzPf","./dst/out.tar.gz","-C","./dst"]
proc = call(cmd)

加密成功,但是解密的verify()导致 ValueError:MAC检查失败

注意:我正在使用PyCryptodome v3.6.6

Encrypt is successful but decrypt's verify() is causing ValueError: MAC check failed
Note: I am using PyCryptodome v3.6.6

我以某种方式成功进行了解密,以下是我的最新文章代码:

Somehow I successfully proceeded with decryption and below is my latest code:

#! /usr/bin/python
from subprocess import Popen,PIPE,call
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import StringIO,io,tarfile
import os,sys
import datetime

print "*** Encryption Starts *** " + str(datetime.datetime.now())
key = get_random_bytes(32)

def readLargeFile(filename):
    with open(filename, "rb") as f:
        while True:
            data = f.read(1024)
            if not data:
                break
            yield data

cmd = ["tar --acls --selinux -czPf /nfs/out.tar.gz ./encrypt_disk/src/*"]
call(cmd, shell=True)

cipher = AES.new(key, AES.MODE_GCM)
ciphertext = []

for data in readLargeFile("/nfs/out.tar.gz"):
    ciphertext.append(cipher.encrypt(data))

out = open("/nfs/out.bin", "wb")
[out.write(x) for x in (cipher.nonce, cipher.digest(), b"".join(ciphertext))]
out.close()
print "*** Encryption Ends *** " + str(datetime.datetime.now())


print "*** Decryption Starts *** " + str(datetime.datetime.now())
file_in = open("/nfs/out.bin", "rb")
nonce, tag, ciphertext = [file_in.read(x) for x in (16, 16, -1)]
cipher = AES.new(key, AES.MODE_GCM, nonce)
tar = tarfile.open(fileobj=StringIO.StringIO(cipher.decrypt_and_verify(ciphertext, tag)), mode='r|*')
os.chdir("/nfs/dst")
tar.extractall(path='.')
print "*** Decryption Ends *** " + str(datetime.datetime.now())


推荐答案

GCM很难(尽管不是不可能)并行化。不过,在我3年的x86笔记本电脑(带有AESNI和CLMUL加速指令)上,使用PyCryptodome的GCM可以达到150 MB / s的速度。 19GB仅2分钟,而不是一天!我使用了以下玩具代码:

GCM is hard (though not impossible) to parallelize. Still, on my 3-year x86 laptop (with AESNI and CLMUL accelerated instructions) I do get 150 MB/s with PyCryptodome's GCM. That is only 2 minutes for 19GB, not a day! I used the following toy code:

data = os.urandom(1024*1024)
cipher = AES.new(key, AES.MODE_GCM)
for _ in range(1024):
    cipher.encrypt(data)
tag = cipher.digest()

该代码不能直接用于您的用例,但是它表明您一次加密整个19GB可能存在问题。也许,您应该改为分块处理。

The code is not directly usable for your use case, but it indicates that there might be an issue with you encrypting the full 19GB at once. Perhaps, you should instead break up the processing in chunks.

其他一些评论:


  • 使用探查器来确定程序在哪里花费最多的时间。可能不是您想的那样(例如 tar 步骤如何?)。

  • 确保您使用的是最新版本PyCryptodome(3.6.6)的版本,因为CLMUL加速是最近才添加的

  • GCM最多只能加密256GB-您与19GB加密距离并不远。

  • Use a profiler to identify where your program takes the most time. It might not be where you think it is (e.g. what about the tar step?).
  • Ensure you are using the latest version of PyCryptodome (3.6.6), since CLMUL acceleration was added only recently.
  • GCM can only encrypt 256GB at most - you are not that far from that with 19GB.

这篇关于Python Pycryptodome AES-GCM加密代码性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆