fscanf 和 sscanf 的速度 [英] The speed of fscanf and sscanf

查看:43
本文介绍了fscanf 和 sscanf 的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于C作业,我应该将大文本文件中的单词分解并一个一个处理.基本上,一个单词是字母的任何线性序列.因为,这将是我的程序的瓶颈,所以我想让这个过程尽可能快.

For a C assignment, I am supposed to break up words in a large text file and process one by one. Basically, a word is any linear sequence of alphabets. Since, this will be the bottleneck of my program, I want to make this process as fast as possible.

我的想法是使用扫描函数格式说明符 ([a-zA-z]) 将文件中的单词扫描到字符串缓冲区中.如果缓冲区已满,我会检查文件中是否有更多字母(基于文件指针所在的位置).如果有,那么我增加缓冲区大小并继续将更多字母复制到缓冲区中,直到遇到非字母.

My idea is to scan words from file into a string buffer using the scan functions format specifier ([a-zA-z]). If the buffer is filled, I check if there more alphabets in the file (based on where file pointer is located at). If there are, then I increase buffer size and continue copying more alphabets into the buffer until I hit a non-alphabet.

问题在于我是使用 fscanf 还是 sscanf(将整个文件复制到一个字符串中).一个比另一个更快还是有更好的替代我的想法?

The problem is whether I use fscanf or sscanf (copy the whole file into a string). Is one faster than the other or is there a better alternative to my idea?

推荐答案

您的问题几乎偏离主题,因为它需要基于意见的答案.

Your question is almost off topic because it calls for opinion based answers.

了解一种方法与另一种方法相比有多快的唯一方法是尝试两种方法并测量生成的可执行文件在真实数据上的性能.

The only way to know how fast one method will be compared to another is to try both and measure performance of the resulting executables on real data.

以当今普通 PC 的计算能力,需要非常大的文件来衡量实际性能差异.

With todays computing power available in regular PCs, it will take a very large file to measure actual performance differences.

因此,请继续实施您的想法.您似乎对潜在的性能瓶颈有很好的了解,将这些想法转化为实际的 C 代码.为这个问题提供 2 个不同但正确的程序以及性能分析应该会让你获得 A+.作为雇主,我在测试中很看重这种方法.

So go ahead and implement your ideas. You seem to have a good understanding of potential performance bottlenecks, turn these ideas into actual C code. Providing 2 different but correct programs for this problem along with a performance analysis should get you an A+. I as an employer value such an approach in a test.

PS:恕我直言,大部分时间将用于从文件系统获取数据.如果文件大于可用内存,那应该是你的瓶颈.如果该文件可以放入操作系统文件系统缓存中,那么后续的基准测试应该会为您提供比第一次更好的性能...

PS: IMHO most of the time will be spent in getting the data from the file system. If the file is larger than available memory, that should be your bottleneck. If the file can fit in the operating system file system cache, subsequent benchmarks should give you much better performance than the first...

如果允许您编写系统特定的代码,请尝试使用 mmap 和简单的 for 循环,通过在 mmapped char 上查找表来进行显式测试代码>数组.

If you are allowed to write system specific code, try using mmap and simple for loops with explicit tests via look up tables over the mmapped char array.

这篇关于fscanf 和 sscanf 的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆