使用fscanf解析单词时,如何检查跳过的行 [英] When using fscanf to parse words, how can I check when I skipped a line

查看:238
本文介绍了使用fscanf解析单词时,如何检查跳过的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个程序,该程序从文件中读取文本并将文本解析为单词并进行操作.我正在用fscanf解析

I'm working on a program that reads text from a file and parses the text to words and manipulates them. I'm parsing with fscanf like that

while (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1)    
{
    /*manipulate the text word by word */
    …
}

我想在每行找到的每个单词旁边写上它.

I wanna write next to each word that I find in which line I found it.

有没有一种方法可以检查我何时向下移动一行
使用功能fscanf时?

Is there a way that I can check when I moved down a line
when using the function fscanf?

推荐答案

最合理的建议是使用 fgets() 或也许是POSIX getline() 读取行,然后考虑使用 sscanf() 来解析每一行.您可能需要考虑如何在循环中使用sscanf() .还有许多其他选项可用于解析行而不是sscanf(),例如 strtok_r() 或不太理想的strtok()-或在Windows上 strspn() strcspn() strpbrk() ;以及其他未标准化的功能.

The soundest advice is to use fgets() or perhaps POSIX getline() to read lines and then consider using sscanf() to parse each line. You will probably need to consider how to use sscanf() in a loop. There are also numerous other options for parsing the line instead of sscanf(), such as strtok_r() or the less desirable strtok() — or, on Windows, strtok_s(); strspn(), strcspn(), strpbrk(); and other functions that are not as standardized.

如果您觉得必须使用fscanf(),则可能需要捕获尾随上下文.一个简单的版本是:

If you feel you must use fscanf(), then you probably need to capture the trailing context. A simple version of that would be:

char c;
while (fscanf(fp, " %32[^ ,.\t\n]%c", word, &c) == 2)
    …

假定存在一个字符,它将捕获单词后的字符.如果您的文件不以换行符结尾,则可能会丢失一个单词.错过换行符也很容易.例如,如果该行在换行符之前以句号(句点)结尾,则c将保留.,并且该循环的下一次迭代将跳过换行符.您可以使用以下方法克服这一点:

This captures the character after the word, assuming there is one. If your file doesn't end with a newline, it is possible a word will be lost. It's also rather too easy to miss a newline. For example, if the line ends with a full stop (period) before the newline, then c will hold the . and the newline will be skipped by the next iteration of the loop. You could overcome that with:

char s[33];
while (fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s) == 2)
    …

请注意,格式字符串中的长度必须比变量声明中的长度小1!

Note that the length in the format string must be one less than the length in the variable declaration!

成功调用fscanf()后,字符串s可能包含多个换行符和空格等. fscanf()函数大多不关心换行符,并且s的扫描集将连续读取多个换行符(如果这就是数据文件中的内容).

After a successful call to fscanf(), the string s could contain multiple newlines and blanks and so on. The fscanf() functions mostly don't care about newlines, and the scan set for s would read multiple newlines in a row if that's what's in the data file.

如果您明确捕获了fscanf()的状态,则可能会对没有换行符(或标点符号)而结束或导致其他问题的文件更敏感:

If you explicitly capture the status from fscanf(), you can be more sensitive to files that end without a newline (or a punctuation character), or that cause other problems:

char s[33];
int rc;
while ((rc = fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s)) != EOF)
{
    switch (rc)
    {
    case 2:
        …proceed as normal, checking s for newlines.
        break;
    case 1:
        …probably an overlong word or EOF without a newline.
        break;
    case 0:
        …probably means the next character is one of comma or dot.
        …spaces, tabs, newlines will be skipped without detection
        …by the leading space in the format string.
        break;
    default:
        assert(0);
        break;
    }
}

如果您开始关心!?;:'"字符-更不用说()了-生活仍然变得更加复杂.实际上,到那时,sscanf()的替代方案开始看起来要好得多.

If you start to care about !, ?, ;, :, ' or " characters — not to mention ( and ) — life gets more complex still. In fact, at that point, the alternatives to sscanf() start looking much better.

很难正确使用scanf()系列功能.对于初学者来说,它们只是工具,至少在您开始需要做任何复杂的事情时.您可以查看 not的初学者指南使用scanf() ,其中包含许多有价值的信息.我对最后两个示例并不完全相信,这些示例应该是scanf()的防弹用途. (正确使用sscanf()会容易一些,但是您仍然需要了解详细内容.)

It is very hard to use the scanf() family of functions correctly. They're anything but tools for the novice, at least once you start needing to do anything complex. You could look at A beginner's guide to not using scanf(), which contains much valuable information. I'm not wholly convinced by the last couple of examples which are supposed to be bomb-proof uses of scanf(). (It is a little easier to use sscanf() correctly, but you still need to understand what you're up to in detail.)

这篇关于使用fscanf解析单词时,如何检查跳过的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆