解析字符串到数组基于空间或QUOT;双引号字符串" [英] Parse string into array based on spaces or "double quotes strings"
问题描述
我试着拿一个用户输入字符串并解析成是一个数组所谓的char * entire_line [100];其中,每个字被放置在阵列的不同指数,但如果将字符串的一部分是由一个报价包封,应把在一个单一的索引。
所以,如果我有
Im trying to take a user input string and parse is into an array called char *entire_line[100]; where each word is put at a different index of the array but if a part of the string is encapsulated by a quote, that should be put in a single index. So if I have
char buffer[1024]={0,};
fgets(buffer, 1024, stdin);
例如输入:这个词FILENAME.TXT这是shoudl占用一个索引在输出数组的字符串;
example input: "word filename.txt "this is a string that shoudl take up one index in an output array";
tokenizer=strtok(buffer," ");//break up by spaces
do{
if(strchr(tokenizer,'"')){//check is a word starts with a "
is_string=YES;
entire_line[i]=tokenizer;// if so, put that word into current index
tokenizer=strtok(NULL,"\""); //should get rest of string until end "
strcat(entire_line[i],tokenizer); //append the two together, ill take care of the missing space once i figure out this issue
}
entire_line[i]=tokenizer;
i++;
}while((tokenizer=strtok(NULL," \n"))!=NULL);
这显然是行不通的,如果封装的字符串中的双引号是在输入字符串的结束只是靠拢
但我可以有
输入:单词这是文本,将用户输入FILENAME.TXT
一直试图算出这个一会,总要得到某个地方卡住。
谢谢
This clearly isn't working and only gets close if the double quote encapsulated string is at the end of the input string but i could have input: word "this is text that will be user entered" filename.txt Been trying to figure this out for a while, always get stuck somewhere. thanks
推荐答案
的 strtok的
的功能是在C来标记一个可怕的方式,除了一个(诚然常见)的情况下:简单的空格隔开的话。 (即使如此,它仍然不是很大,由于缺乏重入和递归的能力,这就是为什么我们发明了 strsep
BSD的归途时的感觉。)
The strtok
function is a terrible way to tokenize in C, except for one (admittedly common) case: simple whitespace-separated words. (Even then it's still not great due to lack of re-entrance and recursion ability, which is why we invented strsep
for BSD way back when.)
在这种情况下,最好的办法是建立自己的简单的状态机:
Your best bet in this case is to build your own simple state-machine:
char *p;
int c;
enum states { DULL, IN_WORD, IN_STRING } state = DULL;
for (p = buffer; *p != '\0'; p++) {
c = (unsigned char) *p; /* convert to unsigned char for is* functions */
switch (state) {
case DULL: /* not in a word, not in a double quoted string */
if (isspace(c)) {
/* still not in a word, so ignore this char */
continue;
}
/* not a space -- if it's a double quote we go to IN_STRING, else to IN_WORD */
if (c == '"') {
state = IN_STRING;
start_of_word = p + 1; /* word starts at *next* char, not this one */
continue;
}
state = IN_WORD;
start_of_word = p; /* word starts here */
continue;
case IN_STRING:
/* we're in a double quoted string, so keep going until we hit a close " */
if (c == '"') {
/* word goes from start_of_word to p-1 */
... do something with the word ...
state = DULL; /* back to "not in word, not in string" state */
}
continue; /* either still IN_STRING or we handled the end above */
case IN_WORD:
/* we're in a word, so keep going until we get to a space */
if (isspace(c)) {
/* word goes from start_of_word to p-1 */
... do something with the word ...
state = DULL; /* back to "not in word, not in string" state */
}
continue; /* either still IN_WORD or we handled the end above */
}
}
请注意,这并不占一个字内使用双引号,例如可能性:
Note that this does not account for the possibility of a double quote inside a word, e.g.:
"some text in quotes" plus four simple words p"lus something strange"
通过状态机上述工作,你会看到引号中的一些文字
变成一个单一的令牌(即忽略了双引号),但 p的lu
也是一个令牌(包含引号),的东西
是一个道理,而奇怪
是一个道理。无论你是想这样,还是要如何处理它,是你的。对于更复杂的,但彻底的词汇符号化,您可能需要使用code-建筑工具,如弯曲
。
Work through the state machine above and you will see that "some text in quotes"
turns into a single token (that ignores the double quotes), but p"lus
is also a single token (that includes the quote), something
is a single token, and strange"
is a token. Whether you want this, or how you want to handle it, is up to you. For more complex but thorough lexical tokenization, you may want to use a code-building tool like flex
.
此外,当为
退出循环,如果状态
不是 DULL
,你需要处理的最后一句话(我离开了这一点,上面的code),并决定该怎么做,如果状态
是 IN_STRING
(意思是有没有近距离双引号)。
Also, when the for
loop exits, if state
is not DULL
, you need to handle the final word (I left this out of the code above) and decide what to do if state
is IN_STRING
(meaning there was no close-double-quote).
这篇关于解析字符串到数组基于空间或QUOT;双引号字符串"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!