最快的PNG解码器,用于.NET [英] Fastest PNG decoder for .NET

查看:254
本文介绍了最快的PNG解码器,用于.NET的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的Web服务器需要将结果发送到Web客户端之前,共同处理大量图像的很多成分。这个过程是性能非常重要,因为服务器可以收到数千每小时的请求。

Our web server needs to process many compositions of large images together before sending the results to web clients. This process is performance critical because the server can receive several thousands of requests per hour.

目前我们从高清解决方案加载PNG文件(大约1MB的每个),并发送它们到视频卡以便该组合物在GPU上完成的。我们首先尝试使用由XNA API暴露了PNG解码器加载我们的图像。我们看到的表现不太好。

Right now our solution loads PNG files (around 1MB each) from the HD and sends them to the video card so the composition is done on the GPU. We first tried loading our images using the PNG decoder exposed by the XNA API. We saw the performance was not too good.

要了解,如果这个问题是从HD或PNG的解码装,我们修改了通过在加载文件存储器流,然后发送该存储器流到.NET PNG译码器。使用XNA或使用System.Windows.Media.Imaging.PngBitmapDecoder类性能的差异不显著。我们大致得到相同的性能水平。

To understand if the problem was loading from the HD or the decoding of the PNG, we modified that by loading the file in a memory stream, and then sending that memory stream to the .NET PNG decoder. The difference of performance using XNA or using System.Windows.Media.Imaging.PngBitmapDecoder class is not significant. We roughly get the same levels of performance.

我们的基准测试显示以下性能测试结果:

Our benchmarks show the following performance results:


  • 从磁盘加载图像:37.76ms 1%

  • PNG图像解码:2816.97ms 77%

  • 在视频硬件加载图像:196.67ms 5%

  • 成分:87.80ms 2%

  • 从视频硬件获取组合结果:166.21ms 5%

  • 编码为PNG:318.13ms 9%

  • 存储到磁盘:3.96ms 0%

  • 清理:53.00ms 1%
  • Load images from disk: 37.76ms 1%
  • Decode PNGs: 2816.97ms 77%
  • Load images on Video Hardware: 196.67ms 5%
  • Composition: 87.80ms 2%
  • Get composition result from Video Hardware: 166.21ms 5%
  • Encode to PNG: 318.13ms 9%
  • Store to disk: 3.96ms 0%
  • Clean up: 53.00ms 1%

总计:3680.50ms 100%

Total: 3680.50ms 100%

从这些结果中我们看到,最慢的部分是解码PNG时。

From these results we see that the slowest parts are when decoding the PNG.

所以如果有不会是一个PNG解码器,我们可以用它将使我们能够减少PNG解码时间,我们都在猜测。我们还考虑保持图像的未压缩的硬盘上,但随后各图像将是在尺寸,而不是1MB并且因为有数万存储在硬盘上这些图像的10MB,因此不可能将它们所有存储而不。压缩

So we are wondering if there wouldn't be a PNG decoder we could use that would allow us to reduce the PNG decoding time. We also considered keeping the images uncompressed on the hard disk, but then each image would be 10MB in size instead of 1MB and since there are several tens of thousands of these images stored on the hard disk, it is not possible to store them all without compression.

编辑:更多有用信息:


  • 的基准测试模拟装载20 PNG图像和合成在一起。这将大致对应的种类的请求,我们将在生产环境中得到的。

  • 在组合物中的每个图像的尺寸1600x1600。

  • 该解决方案将涉及多达10个负载均衡的服务器,比如我们正在讨论一个这里。因此,额外的软件开发工作可能是值得的节省硬件成本。

  • 缓存解码源图像是我们正在考虑的东西,但每个组成将最有可能用完全不同的源图像进行,所以高速缓存未命中会高,性能增益,低。

  • 的基准是一个蹩脚的视频卡完成的,因此我们可以预期PNG解码更是一个性能瓶颈的使用是一个体面的视频卡。

  • The benchmark simulates loading 20 PNG images and compositing them together. This will roughly correspond to the kind of requests we will get in the production environment.
  • Each image used in the composition is 1600x1600 in size.
  • The solution will involve as many as 10 load balanced servers like the one we are discussing here. So extra software development effort could be worth the savings on the hardware costs.
  • Caching the decoded source images is something we are considering, but each composition will most likely be done with completely different source images, so cache misses will be high and performance gain, low.
  • The benchmarks were done with a crappy video card, so we can expect the PNG decoding to be even more of a performance bottleneck using a decent video card.

推荐答案

有另一种选择。那就是,你写你自己的基于GPU的PNG解码器。你可以使用OpenCL的公平高效地执行此操作(并执行使用OpenGL的组合物,它能与OpenCL的共享资源)。另外,也可以交织为最大吞吐量传输和解码。如果这是一个路线,你可以/想追求我可以提供更多的信息。

There is another option. And that is, you write your own GPU-based PNG decoder. You could use OpenCL to perform this operation fairly efficiently (and perform your composition using OpenGL which can share resources with OpenCL). It is also possible to interleave transfer and decoding for maximum throughput. If this is a route you can/want to pursue I can provide more information.

下面是与基于GPU的DEFLATE一些资源(和膨胀)。

Here are some resources related to GPU-based DEFLATE (and INFLATE).


  1. 加快无损压缩与GPU的

  2. 的GPU块的压缩在谷歌代码中使用CUDA。

  3. 浮点数据压缩在在GPU 75 Gb / s的 - 注意,这不使用膨胀/ DEFLATE但这是更GPU的友好的新型并行压缩/解压缩方案
  1. Accelerating Lossless compression with GPUs
  2. gpu-block-compression using CUDA on Google code.
  3. Floating point data-compression at 75 Gb/s on a GPU - note that this doesn't use INFLATE/DEFLATE but a novel parallel compression/decompression scheme that is more GPU-friendly.

希望这有助于!

这篇关于最快的PNG解码器,用于.NET的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆