使用strtok读取csv文件 [英] Use strtok read csv file

查看:134
本文介绍了使用strtok读取csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在C中使用strtok读取csv文件,并将内容存储到struct Game数组中. 我的代码如下所示:

  FILE *fp;
  int i = 0;
  if((fp=fopen("Games.csv","r"))==NULL)
    {
      printf("Can't open file.\n");
      exit(1);
    }
  rewind(fp);
  char buff[1024]; 
  fgets(buff,1024,fp);
  char* delimiter = ",";

  while(fgets(buff, 1024, (FILE*)fp)!=NULL && i<5){

    Game[i].ProductID= strtok(buff, ",");   


    Game[i].ProductName = strtok(NULL, delimiter);

        Game[i].Publisher = strtok(NULL, delimiter);

    Game[i].Genre = strtok(NULL, delimiter);

    Game[i].Taxable = atoi(strtok(NULL, delimiter));

    Game[i].price = strtok(NULL, delimiter);

    Game[i].Quantity  = atoi(strtok(NULL, delimiter));


       printf("%s\n", Game[i].ProductID);

    i++;
   }


    i = 0;
    for(i = 0; i<5; i++){
       printf("%s", Game[i].ProductID);
    }

输出如下所示:

 DS_25ROGVOIRY
DS_25MMD4N2BL
DS_258KADVNLH
DS_25UR7M375D
DS_25FP45CJFZ
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
 

前五行(在while循环中)是正确的.但是,最后五行(while循环之外)是错误的,它将打印整个行的内容.

我对此很困惑.当更改数组以及在while循环后如何仍然打印正确答案时.

解决方案

首先,介绍strtok()的工作原理.该函数将为您返回指向原始字符串中某处的指针,该字符串已被修改以使其看起来像只有一个令牌(a)./p>

例如,"A,B,C"的第一个strtok会将其转换为"A\0B,C",并为您提供A字符的地址.届时使用它会给你"A".

类似地,第二个调用会将其转换为"A\0B\0C",并给您发回B字符的地址.

它为您提供指向原始字符串的指针这一事实在这里至关重要,因为原始字符串位于buff中.

而且,每次您从文件中读取一行时,实际上就是覆盖 buff.因此,对于所有这五行,Game[i].ProductID将仅是buff的第一个字符的地址.处理完第五行后,该行:

while (fgets(buff, 1024, fp) != NULL && i < 5)

在退出循环之前,将先读入第六行 .

这就是为什么您看到的最后几行实际上与前五行都不相同的原因.您将在buff的(相同)地址上打印出ProductID的所有C字符串,因此您只看到第六个,并且看到 full 行,因为您没有读完后,将其标记化.

您需要做的是在覆盖该行之前对令牌进行复制.可以使用类似的方法(有点复杂,但可以正确处理strtok返回NULL的情况):

if ((Game[i].ProductID = strtok(buff, ",")) != NULL)
    Game[i].ProductID = strdup(Game[i].ProductID);

记住您应该在某些时候free这些内存分配.

在您的环境没有strdup(它是POSIX而不是ISO)的不可思议的事件中,请参阅

The output is shown below:

DS_25ROGVOIRY
DS_25MMD4N2BL
DS_258KADVNLH
DS_25UR7M375D
DS_25FP45CJFZ
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2

The first five lines (in the while loop) are correct. However, the last five lines( outside of while loop) are wrong, it print the whole line content.

I am so confused about it. When the array is changed and how to still print the correct answer after while loop.

解决方案

First, a primer on how strtok() works. The function will give you back a pointer to somewhere in the original string, said string having been modified to make it look like you only have a single token (a).

For example, the first strtok of "A,B,C" would turn it into "A\0B,C" and give you back the address of the A character. Using it at that point would then give you "A".

Similarly, the second call would turn it into "A\0B\0C" and give you back the address of the B character.

The fact that it's giving you pointers into the original string is paramount here because that original string is located in buff.

And, you're actually overwriting buff every time you read a line from the file. So, for all those five lines, Game[i].ProductID will simply be the address of the first character of buff. After you have processed the fifth line, the line:

while (fgets(buff, 1024, fp) != NULL && i < 5)

will first read in the sixth line before exiting the loop.

This is why the final lines you see are actually not the same as any of the first five. You're printing out all the C strings for ProductID, at the (identical) addresses of buff, so you only see the sixth one, and you see the full line because you didn't tokenise that one after reading it in.

What you need to do is to make a copy of the tokens before overwriting the line. That can be done with something like (it's a little complex but correctly handles the case where strtok returns NULL):

if ((Game[i].ProductID = strtok(buff, ",")) != NULL)
    Game[i].ProductID = strdup(Game[i].ProductID);

remembering that you should free those memory allocations at some point.

In the incredibly unlikely event your environment doesn't have a strdup (it's POSIX rather than ISO), see here.


And, just as an aside, most CSV implementations allow for embedded commas such as by enclosing them in quotes or escaping them (the latter is rare but I have seen them):

name,"diablo, pax",awesome
name,diablo\, pax,awesome

Both of those may be expected to be three fields, name, diablo, pax and awesome.

Simplified processing with strtok will not allow for such complexities but, assuming your fields do not contain embedded commas, it may be okay. If your input is more complex, you may be better off using a third-party CSV library (with a suitable licence of course).


(a) For the language lawyers among us, this is covered in the ISO C standard, C11 7.24.5.8 The strtok function, /3 and /4 (my bold):

3/ The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

4/ The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.

这篇关于使用strtok读取csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆