代码分析以提高性能:请参阅mscorlib.dll中的CPU周期? [英] Code profiling to improve performance : see CPU cycles inside mscorlib.dll?

查看:142
本文介绍了代码分析以提高性能:请参阅mscorlib.dll中的CPU周期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个小测试基准比较.NET的 System.Security.Cryptography AES实现与BouncyCastle.Org的AES。



链接到GitHub代码::


在Windows 8上,分析器使用不同的底层技术
它在以前版本的Windows上所做的,这就是为什么
的行为在Windows 8上是不同的。通过新技术,
分析器需要符号文件(PDB)才能知道什么功能是
目前正在NGEN的图像中执行。


(...)


然而,我们的积压工作在下一个版本中实现


该贴子自动生成PDB文件(谢谢!)。


I made a small test benchmark comparing .NET's System.Security.Cryptography AES implementation vs BouncyCastle.Org's AES.

Link to GitHub code: https://github.com/sidshetye/BouncyBench

I'm particularly interested in AES-GCM since it's a 'better' crypto algorithm and .NET is missing it. What I noticed was that while the AES implementations are very comparable between .NET an BouncyCastle, the GCM performance is quite poor (see extra background below for more). I suspect it's due to many buffer copies or something. To look deeper, I tried profiling the code (VS2012 => Analyze menu bar option => Launch performance wizard) and noticed that there was a LOT of CPU burn inside mscorlib.dll

Question: How can I figure out what's eating most of the CPU in such a case? Right now all I know is "some lines/calls in Init() burn 47% of CPU inside mscorlib.ni.dll" - but without knowing what specific lines, I don't know where to (try and) optimize. Any clues?

Extra background:

Based on the "The Galois/Counter Mode of Operation (GCM)" paper by David A. McGrew, I read "Multiplication in a binary field can use a variety of time-memory tradeoffs. It can be implemented with no key-dependent memory, in which case it will generally run several times slower than AES. Implementations that are willing to sacrifice modest amounts of memory can easily realize speeds greater than that of AES."

If you look at the results, the basic AES-CBC engine performances are very comparable. AES-GCM adds the GCM and reuses the AES engine beneath it in CTR mode (faster than CBC). However, GCM also adds multiplication in the GF(2^128) field in addition to the CTR mode, so there could be other areas of slowdown. Anyway, that's why I tried profiling the code.

For the interested, where is my quick test performance benchmark. It's inside a Windows 8 VM and YMMV. The test is configurable but currently it's to simulate crypto overhead in encrypting many cells of a database (=> many but small plaintext input)

Creating initial random bytes ...
Benchmark test is : Encrypt=>Decrypt 10 bytes 100 times

Name               time (ms)    plain(bytes) encypted(bytes)   byte overhead

.NET ciphers
AES128                1.5969              10              32      220 %
AES256                1.4131              10              32      220 %
AES128-HMACSHA256     2.5834              10              64      540 %
AES256-HMACSHA256     2.6029              10              64      540 %

BouncyCastle Ciphers
AES128/CBC            1.3691              10              32      220 %
AES256/CBC            1.5798              10              32      220 %
AES128-GCM           26.5225              10              42      320 %
AES256-GCM           26.3741              10              42      320 %

R - Rerun tests
C - Change size(10) and iterations(100)
Q - Quit

解决方案

This is a rather lame move from Microsoft as they obviously broke a feature that worked well before Windows 8, but no longer, as explained in this MSDN blog post: :

On Windows 8 the profiler uses a different underlying technology than what it does on previous versions of Windows, which is why the behavior is different on Windows 8. With the new technology, the profiler needs the symbol file (PDB) to know what function is currently executing inside NGEN’d images.

(...)

It is however on our backlog to implement in the next version of Visual Studio.

The post gives directions to generate the PDB files yourself (thanks!).

这篇关于代码分析以提高性能:请参阅mscorlib.dll中的CPU周期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆