最小化代码运行时间 [英] minimize code running time

查看:85
本文介绍了最小化代码运行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨所有



我在Visual C ++中使用G729的解码器。这个解码器解码慢我通过应用面向对象但不减少大文件处理时间简化代码。请告诉任何想法如何最小化文件解码的时间。像5 gb输入文件需要5个小时来处理。



解码器以缓冲形式输入,如80位输入和10字节输出写入文件。





关注

hi all

i m using decoder of G729 in visual C++ . this decoder decode slow i simplify the code by applying object oriented but large file processing time not decrease .please tell any idea how to minimize the timing of file decoding . like 5 gb input file take 5 hours to process .

decoder take input in buffer form like 80 bits input and 10 bytes output to write on file.


Regards

推荐答案

引用:

我通过应用面向对象来简化代码但是大文件处理时间不减少

i simplify the code by applying object oriented but large file processing time not decrease

简化代码是代码可读性和可靠性的先决条件,而不是速度性能。



为了加快代码的速度,您应该采用相反的方向,即更低级别(例如,更详细地考虑与底层操作系统的交互,更喜欢原始访问权限)强大抽象的资源等等)并将你的 C ++ 代码集中在速度优化上。

如果没有看到实际代码,我无法帮到你。

Simplified code is a prerequisite for code readibility and reliability, not for speed performance.

In order to speed up the code you should probably go in the opposite direction, that is go more low level (e.g. consider the interaction with the underlying OS in more detail, prefer raw access to resources intead of powerful abstractions and so on) and focus your C++ code on speed optimization.
I cannot help you more without seeing the actual code.


您使用自己的解码器吗?



然后我建议使用现有的ffmpeg库。



否则调整编译器的优化选项以获取解码器源文件。一些提示:



  • 避免在循环内部分配内存。分配外部循环并将堆栈变量用于已知(最大)大小的小缓冲区。
  • 避免在循环内使用gloabl变量。请改用本地副本。
Did you use your own decoder?

Then I suggest to use an existing one like from the ffmpeg library.

Otherwise tune the optimisation options of your compiler for the decoder source files. Some tips:

  • Avoid memory allocations inside loops. Allocate outside loops and use stack variables for small buffers with known (max.) size.
  • Avoid usage of gloabl variables inside loops. Use a local copy instead.
  • 告诉编译器哪个分钟。应支持处理器生成。
  • 使用浮点运算时,告诉编译器尽可能使用向量指令(SSE,AVX)(仅限时)在现代CPU上运行。)
  • 了解循环展开并检查是否可以使用。
  • Learn about loop unrolling and check if it can be used.
  • 事先知道最终文件大小,创建具有此大小的文件并快退。


我会按顺序执行以下操作,直到它足够快:



- 打开优化器 - 在某些情况下,未经优化的构建可能会慢10倍。有一个快速摆弄编译器设置,以确保它足够积极。



- 首先检查解码器的算法,看看是否你可以使用更好的一个。如果它一次仅处理10个字节,则在算法的每次传递期间可能有机会一次处理多个块[1]。还要看看使用别人的图书馆来完成这项工作 - 我通常认为其他人比我更了解如何实施这些东西所以为什么不借用或购买最好的? [2]



- 确保你在工作中使用正确的结构,而不是在算法的每次传递过程中做一些像重新生成中间结构这样的东西。例如每次循环时都不要创建一个新的G729Decompressor对象。



- 确保你使用的是缓冲文件I / O并且没有做任何类似的事情每次读取或写入后关闭输出文件。使用可以容纳的最大内存量来缓冲数据的输入和输出。如果这加快了处理速度,那么可能是I / O绑定,因此更快的存储设备也可能有所帮助。



- 一旦你有一个巨大的缓冲区尝试多-threading - 每个处理器一个线程,并为每个线程提供一个输入缓冲区的子集,以便一举处理。



- 确保你没有做任何事情处理器缓存不喜欢。最小化算法的工作集,并尝试确保一次访问的所有数据都在内存中关闭。将它放在同一个内存页面上是确保不会出现缓存冲突的好方法。使用自动变量是将所有数据放在一个页面上的好方法。动态内存分配是搞砸它的好方法。通过函数指针表调用也可以搞乱缓存,因为它们通常不靠近算法使用的数据,所以(我讨厌这样说)最小化使用虚函数,除非它真的扭曲了你的算法。



如果你尝试了那么多并且你没有更快的回来再问一次,但这需要你花一点时间才能完成。



[1]可能没有一个,因为G.729意味着在10字节帧上运行,但这通常是个好建议。也没有真正的动力来快速解码,因为它是实时使用,每10毫秒一帧不是很多,以跟上现代处理器(5GB是一个公制f ***吨G.729数据 - 多年的对话)所以你使用的实现可能相当骇人听闻。



[2]购买第三方代码不是一个选项你正在为一个小偷或是学生工作。
I'd do the following things in order until it's fast enough:

- Switch the optimiser on - an unoptimised build can be a factor of 10 slower in some cases. Have a quick fiddle with the compiler settings to make sure it's being aggressive enough.

- Check the algorithm the decoder's using first to see if there's a better one you can use. If it's only processing 10 bytes at a time there may be an opportunity to process more than one lump at a time during each pass of the algorithm [1]. Also look at using someone else's library to do the job - I usually assume that other people know better than I about how to implement stuff so why not borrow or buy from the best? [2]

- Make sure you're using the right structures for the job and not doing something daft like regenerating intermediate structures during each pass of the algorithm. e.g. don't create a new G729Decompressor object each time around the loop.

- Make sure you're using buffered file I/O and aren't doing anything daft like closing the output file after every read or write. Use the largest amount of memory you can get hold of to buffer the data in and out. If this speeds up the processing a lot it's probably I/O bound so a faster storage device might also help.

- Once you've got a massive buffer try multi-threading - one thread per processor and give each thread a subset of your input buffer to process in one fell swoop.

- Make sure you've not doing something the processor caches don't like. Minimise the working set for the algorithm and try and make sure that all the data being accessed at one time is close in memory. Getting it on the same memory page is a good way of making sure you don't get cache collisions. Using automatic variables is good way of getting all your data on one page. Dynamic memory allocation is a good way to screw it up. Calls through tables of function pointers can also mess up caching as they're often not near the data used by an algorithm so (and I hate to say this) minimise using virtual functions unless it really warps your algorithm.

If you try that lot and you haven't got any faster come back and ask again, but that should take you a while to work through.

[1] There probably isn't one as G.729 is meant to operate on 10 byte frames but this is good advice generally. There was also no real incentive to be fast to decode as it was meant for real-time use, one frame every 10ms isn't a lot to keep up with on a modern processor (5GB is a metric f***-ton of G.729 data - years of conversation) so the implementation you're using might be rather appalling.

[2] Buying 3rd party code isn't an option when you're working for a cheapskate or are a student.


这篇关于最小化代码运行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆