使用strtok读取csv文件 [英] Use strtok read csv file
问题描述
我试图在C中使用strtok读取csv文件,并将内容存储到struct Game数组中. 我的代码如下所示:
FILE *fp;
int i = 0;
if((fp=fopen("Games.csv","r"))==NULL)
{
printf("Can't open file.\n");
exit(1);
}
rewind(fp);
char buff[1024];
fgets(buff,1024,fp);
char* delimiter = ",";
while(fgets(buff, 1024, (FILE*)fp)!=NULL && i<5){
Game[i].ProductID= strtok(buff, ",");
Game[i].ProductName = strtok(NULL, delimiter);
Game[i].Publisher = strtok(NULL, delimiter);
Game[i].Genre = strtok(NULL, delimiter);
Game[i].Taxable = atoi(strtok(NULL, delimiter));
Game[i].price = strtok(NULL, delimiter);
Game[i].Quantity = atoi(strtok(NULL, delimiter));
printf("%s\n", Game[i].ProductID);
i++;
}
i = 0;
for(i = 0; i<5; i++){
printf("%s", Game[i].ProductID);
}
输出如下所示:
DS_25ROGVOIRY
DS_25MMD4N2BL
DS_258KADVNLH
DS_25UR7M375D
DS_25FP45CJFZ
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
前五行(在while循环中)是正确的.但是,最后五行(while循环之外)是错误的,它将打印整个行的内容.
我对此很困惑.当更改数组以及在while循环后如何仍然打印正确答案时.
首先,介绍strtok()
的工作原理.该函数将为您返回指向原始字符串中某处的指针,该字符串已被修改以使其看起来像只有一个令牌(a)./p>
例如,"A,B,C"
的第一个strtok
会将其转换为"A\0B,C"
,并为您提供A
字符的地址.届时使用它会给你"A"
.
类似地,第二个调用会将其转换为"A\0B\0C"
,并给您发回B
字符的地址.
它为您提供指向原始字符串的指针这一事实在这里至关重要,因为原始字符串位于buff
中.
而且,每次您从文件中读取一行时,实际上就是覆盖 buff
.因此,对于所有这五行,Game[i].ProductID
将仅是buff
的第一个字符的地址.处理完第五行后,该行:
while (fgets(buff, 1024, fp) != NULL && i < 5)
在退出循环之前,将先读入第六行 .
这就是为什么您看到的最后几行实际上与前五行都不相同的原因.您将在buff
的(相同)地址上打印出ProductID
的所有C字符串,因此您只看到第六个,并且看到 full 行,因为您没有读完后,将其标记化.
您需要做的是在覆盖该行之前对令牌进行复制.可以使用类似的方法(有点复杂,但可以正确处理strtok
返回NULL的情况):
if ((Game[i].ProductID = strtok(buff, ",")) != NULL)
Game[i].ProductID = strdup(Game[i].ProductID);
记住您应该在某些时候free
这些内存分配.
在您的环境没有 The output is shown below: The first five lines (in the while loop) are correct. However, the last five lines( outside of while loop) are wrong, it print the whole line content. I am so confused about it. When the array is changed and how to still print the correct answer after while loop. First, a primer on how For example, the first Similarly, the second call would turn it into The fact that it's giving you pointers into the original string is paramount here because that original string is located in And, you're actually overwriting will first read in the sixth line before exiting the loop. This is why the final lines you see are actually not the same as any of the first five. You're printing out all the C strings for What you need to do is to make a copy of the tokens before overwriting the line. That can be done with something like (it's a little complex but correctly handles the case where remembering that you should In the incredibly unlikely event your environment doesn't have a And, just as an aside, most CSV implementations allow for embedded commas such as by enclosing them in quotes or escaping them (the latter is rare but I have seen them): Both of those may be expected to be three fields, Simplified processing with (a) For the language lawyers among us, this is covered in the ISO C standard, 3/ The first call in the sequence searches the string pointed to by 4/ The
这篇关于使用strtok读取csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!strdup
(它是POSIX而不是ISO)的不可思议的事件中,请参阅DS_25ROGVOIRY
DS_25MMD4N2BL
DS_258KADVNLH
DS_25UR7M375D
DS_25FP45CJFZ
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
DS_25AN1EA3PV,Blitz I: The League,Midway Games,Sports,0,$103.03 ,2
strtok()
works. The function will give you back a pointer to somewhere in the original string, said string having been modified to make it look like you only have a single token (a).strtok
of "A,B,C"
would turn it into "A\0B,C"
and give you back the address of the A
character. Using it at that point would then give you "A"
."A\0B\0C"
and give you back the address of the B
character.buff
.buff
every time you read a line from the file. So, for all those five lines, Game[i].ProductID
will simply be the address of the first character of buff
. After you have processed the fifth line, the line:while (fgets(buff, 1024, fp) != NULL && i < 5)
ProductID
, at the (identical) addresses of buff
, so you only see the sixth one, and you see the full line because you didn't tokenise that one after reading it in.strtok
returns NULL):if ((Game[i].ProductID = strtok(buff, ",")) != NULL)
Game[i].ProductID = strdup(Game[i].ProductID);
free
those memory allocations at some point.strdup
(it's POSIX rather than ISO), see here.
name,"diablo, pax",awesome
name,diablo\, pax,awesome
name
, diablo, pax
and awesome
.strtok
will not allow for such complexities but, assuming your fields do not contain embedded commas, it may be okay. If your input is more complex, you may be better off using a third-party CSV library (with a suitable licence of course).
C11 7.24.5.8 The strtok function, /3 and /4
(my bold):
s1
for the first character that is not contained in the current separator string pointed to by s2
. If no such character is found, then there are no tokens in the string pointed to by s1
and the strtok
function returns a null pointer. If such a character is found, it is the start of the first token.strtok
function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1
, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.