解析一个逗号用C使用的fscanf分隔的文件() [英] Parsing a comma separated file using C using fscanf()

查看:295
本文介绍了解析一个逗号用C使用的fscanf分隔的文件()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据是这样的一个文件 -

I have a file with data something like this -

Name, Age, Occupation
John, 14, Student
George, 14, Student
William, 23, Programmer

现在,我想读这样每个值(如姓名,年龄等)读出的字符串的数据
这是我的code片段 - 。

Now, I want to read the data such that each value (e.g. Name, Age etc.) are read as a string.
This is my code snippet -

....
if (!(ferror(input_fp) || ferror(output_fp))) {
    while(fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", 
                name, age_array, occupation) != EOF){
        fprintf(stdout, "%-30s%-30s%-30s\n", name, age_array, occupation);
    }
    fclose(input_fp);
    fclose(output_fp);
}
....

不过,这将进入一个无限循环给一些随机的输出。

这就是我所理解我的输入转换说明

30%[^,\\ n \\ t] - >读取字符串,它是在最多30个字符长,而且
包括无论是空格,逗号,换行符或制表符。
而我读3这样的字符串。

我在哪里去了?

However, this goes into an infinite loop giving some random output.
This is how I understand my input conversion specifiers.
%30[^ ,\n\t] -> read a string that is at the maximum 30 characters long and that
DOES NOT include either a space, a comma, a newline or a tab character.
And I am reading 3 such strings.
Where am I going wrong?

推荐答案

OP的

fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", ...

不消耗还是的'\\ n'在文本文件中。随后的fscanf()尝试也失败并返回一个0值,这不是 EOF ,会导致一个无限循环。

does not consume the ',' nor the '\n' in the text file. Subsequent fscanf() attempts also fail and return a value of 0, which not being EOF, causes an infinite loop.

虽然OP请求的fscanf()液,与fgets()/的sscanf()更好地处理潜在的IO和解析错误。

Although OP requested a fscanf() solution, a fgets()/sscanf() better handles potential IO and parsing errors.

FILE *input_fp;
FILE *output_fp;
char buf[100];
while (fgets(buf, sizeof buf, input_fp) != NULL) {
  char name[30];  // Insure this size is 1 more than the width in scanf format.
  char age_array[30];
  char occupation[30];
  #define VFMT " %29[^ ,\n\t]"
  int n;  // Use to check for trailing junk

  if (3 == sscanf(buf, VFMT "," VFMT "," VFMT " %n", name, age_array,
      occupation, &n) && buf[n] == '\0') {
    // Suspect OP really wants this width to be 1 more
    if (fprintf(output_fp, "%-30s%-30s%-30s\n", name, age_array, occupation) < 0)
      break;
  } else
    break;  // format error
}
fclose(input_fp);
fclose(output_fp);

而不是调用 FERROR(),检查的返回值与fgets() fprintf中()

可疑OP未申报的场缓冲器均 [30] 和调整 scanf()的相应。

Suspect OP's undeclared field buffers were [30] and adjusted scanf() accordingly.

关于详细信息,如果(3 ==的sscanf(BUF,VFMT,...

如果(3 ==的sscanf(...)及和放大器; BUF [N] =='\\ 0'){为真时:

1)正是3 %29 [^ \\ n \\ t]格式说明至少在1每个scanf函数字符每个。结果
2) BUF [N] 是字符串的结尾。 N 通过%N说明设置。在preceding %N最后<$ C后会引起任何下列空白$ C>%29 [^ \\ n \\ t]被消耗。 scanf()的看到%N,它引导它来设置电流从扫描开始偏移是分配给 INT &放指了指; N

The if (3 == sscanf(...) && buf[n] == '\0') { becomes true when:
1) exactly the 3 "%29[^ ,\n\t]" format specifiers each scanf in at least 1 char each.
2) buf[n] is the end of the string. n is set via the "%n" specifier. The preceding ' ' in " %n" causes any following white-space after the last "%29[^ ,\n\t]" to be consumed. scanf() sees "%n", which directs it to set the current offset from the beginning of scanning to be assign to the int pointed to by &n.

VFMT,VFMT,VFMT%N是由编译器连接起来结果
%29 [^ \\ n \\ t],%29 [^ \\ n \\ t],%29 [^ \\ n \\ T]%N。结果
我觉得前者更容易比后者维护。

"VFMT "," VFMT "," VFMT " %n" is concatenated by the compiler to
" %29[^ ,\n\t], %29[^ ,\n\t], %29[^ ,\n\t] %n".
I find the former easier to maintain than the latter.

%29 [^ \\ n \\ t]指引的sscanf()来的首次太空扫描结束(消费,而不是保存)0个或更多的空格('\\ T'的'\\ n'等)。其余的指导的sscanf()消耗,节省的任何的1至29 字符 除了的'\\ n''\\ t ,然后附加一个'\\ 0'

The first space in " %29[^ ,\n\t]" directs sscanf() to scan over (consume and not save) 0 or more white-spaces (' ', '\t', '\n', etc.). The rest directs sscanf() to consume and save any 1 to 29 char except ',', '\n', '\t', then append a '\0'.

这篇关于解析一个逗号用C使用的fscanf分隔的文件()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆