从 C 中的文本文件中读取 [英] Reading from Text files in C

查看:42
本文介绍了从 C 中的文本文件中读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

真的是一个小问题.阅读包含 X 个单词的文本文件并将每个单词一个一个地添加到链接列表中的最佳方法是什么.即青蛙老了.

A small question really. What would be the best for reading a text file containing X number of words, and adding each word, one by one to a linked list. i.e. The Frog Is Old.

因此,The、Frog、Is 和 Old 将分别放入 ListNode 中,全部从文件中读取.

Thus, The, Frog, Is and Old would each be put into a ListNode, all read from a file.

真的很想知道与 fscanf 结合使用的最佳功能,如果 fscanf 甚至是最佳选择.所有的建议都很棒!

Really am wondering the best function to use in conjunction with fscanf, if fscanf is even the best option. All advice is great!

干杯.

我的查询真的是,如果我想解析一个大文本文件,最好将每个单词一个一个地 fscanf 到一个数组中,添加到列表中,释放数组,然后重复?或者有没有更有效的方法

My query is really, if I wanted to parse a large text file, would it be best to fscanf each word into an array one by one, add to list, free array, and repeat? Or is there a more effecient way

推荐答案

%s"转换说明符将匹配非空白字符.

The "%s" conversion specifier will match non-whitespace characters.

#define QUOTE(s) #s
#define STR(s) QUOTE(s)

#ifndef BUFSIZE
#  define BUFSIZE 255
#endif

char buf[BUFSIZE+1];
while (fscanf(fin, "%" STR(BUFSIZE) "s", buf)) {
    /* buf holds next word. Todo:
       + allocate space for word
       + copy word to newly allocated space
       + add to linked list
     */
}

或者,strtok用于将字符串标记(分解)为子字符串,使用您指定的一组字符(作为字符数组).您的系统可能还有 strsep,用于替换 strtok.strtokstrsep 都会修改您传入的数组,因此请注意这不会导致访问数据的代码的其他部分出现问题.strsep 不是线程安全的;如果您有多个线程访问要解析的字符串,请使用 strsepstrtok_r.

Alternatively, strtok can be use to tokenize (break up) a string into substrings, using a set of characters (as a character array) you specify. Your system may also have strsep, which is intended to replace strtok. Both strtok and strsep modify the array you pass in, so take care that this won't cause issues with other parts of the code that accesses the data. strsep is not thread-safe; if you have multiple threads accessing the string to be parsed, use strsep or strtok_r.

#ifndef BUFSIZE
#  define BUFSIZE 256
#endif

const char separators[] = "\t\n\v\r\f !\"#$%&'()*+,-./:;<=>?@[\\]^`{|}~";
char buf[BUFSIZE], *line, *word, *rest;

while (fgets(buf, BUFSIZE+1, fin)) {
    rest = line = buf;
    while ((word = strtok_r(line, separators, &rest))) {
        /* Todo:
           + allocate space for word
           + copy word to newly allocated space
           + add to linked list
        */
        line=rest;
    }
}

由于第二个示例一次从文件中读取一行以供 strtok_r 处理,如果文件的任何行长度超过 BUFSIZE-1 个字符并且 BUFSIZE-1st 和 BUFSIZEth 一行中的字符都是字母,第二个例子将单词一分为二.对此的解决方案是创建一个缓冲的字符串流,以便当到达缓冲区的末尾时,缓冲区中剩余的任何内容都被移到前面,缓冲区的其余部分填充了来自文件的更多数据(只是请注意长度超过缓冲区的单词;在生产代码中,这是一个潜在的安全漏洞,可能导致拒绝服务攻击).

Since the second example reads a line at a time from the file for strtok_r to work on, if any line of the file is over BUFSIZE-1 characters long and the BUFSIZE-1st and BUFSIZEth characters in a line are both letters, the second example will split words in two. A solution to this would be to create a buffered string stream, so that when the end of the buffer is reached, anything remaining in the buffer is shifted to the front and the rest of the buffer is filled with more data from the file (just be careful about words longer than the buffer; in production code, it's a potential security vulnerability that could lead to denial of service attacks).

上述所有函数的一个问题是它们不处理输入中的空字符.如果您希望解析可能包含空字符的数据,则需要使用非标准函数,其中包括编写您自己的函数.

An issue with all of the above functions is they don't handle null characters in input. If you wish to parse data that may contain null characters, you'll need to use a non-standard function, which includes writing your own.

至于效率,您使用的任何算法都需要从文件中读取(复杂度为 O(n),并且需要 I/O,减慢程序速度)并分配内存来存储单词.无论您使用 fscanfstrtok 还是其他一些方法,时间和空间复杂度都不会有太大变化;唯一可能是分配了多少中间缓冲区.要找到最有效的实现,最好的办法是尝试几个并对其进行分析.

As for efficiency, any algorithm you use is going to need to read from the file (which is O(n) in complexity, and will require I/O, slowing down the program) and allocate memory to store the words. Whether you use fscanf, strtok or some other method, the time and space complexity isn't likely to vary much; about the only thing that might is how many intermediate buffers get allocated. Your best bet to find the most efficient implementation is to try a couple and profile them.

这篇关于从 C 中的文本文件中读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆