是的strtok坏了吗?或者只是猫腻? [英] Is strtok broken? Or just tricky?
问题描述
是 strtok的
无可救药的坏了?
在关于C文本的解析StackOverflow的许多疑问,有人会建议使用 strtok的
,
和一个共同的答复是, strtok的
不应该被使用,它是无可救药的坏了。
On many StackOverflow questions about text-parsing in C, someone will suggest using strtok
,
and one common reply is that strtok
should never be used, that it is hopelessly broken.
一些海报声称 strtok的
的问题仅限于多线程问题,它是在单线程环境是安全的。
Some posters have claimed that strtok
's problems are limited to multi-threading issues, and it is safe in a single-threaded environment.
什么是正确的答案?结果
它的工作原理?结果
它是无可救药的坏了?结果
你能你的答案举例备份?
What is the right answer?
Does it work?
Is it hopelessly broken?
Can you back up your answer with examples?
推荐答案
是 strtok的
是无可救药的坏了, 即使在一个简单的单线程程序的,然后我将演示此故障的一些示例code:
Yes, strtok
is hopelessly broken, even in a simple single-threaded program, and I will demonstrate this failure with some sample code:
让我们用一个简单的文本分析功能开始收集有关文本的句子统计,使用 strtok的
。
这code会导致不确定的行为。
Let us begin with a simple text-analyzer function to gather statistics about sentences of text, using strtok
.
This code will lead to undefined behavior.
在这个例子中,一个句子是一组用空格,逗号,分号和句点分隔的单词。
In this example, a sentence is a set of words separated by spaces, commas, semi-colons, and periods.
// Example:
// int words, longest;
// GetSentenceStats("There were a king with a large jaw and a queen with a plain face, on the throne of England.", &words, &longest);
// will report there are 20 words, and the longest word has 7 characters ("England").
void GetSentenceStats(const char* sentence, int* pWordCount, int* pMaxWordLen)
{
char* delims = " ,;."; // In a sentence, words are separated by spaces, commas, semi-colons or period.
char* input = strdup(sentence); // Make an local copy of the sentence, to be modified without affecting the caller.
*pWordCount = 0; // Initialize the output to Zero
*pMaxWordLen = 0;
char* word = strtok(input, delims);
while(word)
{
(*pWordCount)++;
*pMaxWordLen = MAX(*pMaxWordLen, (int)strlen(word));
word = strtok(NULL, delims);
}
free(input);
}
这个简单的功能工作。有没有错误为止。
现在,让我们增强我们的图书馆补充说,收集有关文本段落的统计功能。结果
一个段落是一组用感叹号,问号和句号分隔句子。
Now let us augment our library to add a function that gathers stats on Paragraphs of text.
A paragraph is a set of sentences separated by Exclamation Marks, Question Marks and Periods.
它将返回在该段句子的数目,和在最长句子的字的数量。结果
也许最重要的是,它将使用较早的函数 GetSentenceStats
帮助
It will return the number of sentences in the paragraph, and the number of words in the longest sentence.
And perhaps most importantly, it will use the earlier function GetSentenceStats
to help
void GetParagraphStats(const char* paragraph, int* pSentenceCount, int* pMaxWords)
{
char* delims = ".!?"; // Sentences in a paragraph are separated by Period, Question-Mark, and Exclamation.
char* input = strdup(paragraph); // Make an local copy of the paragraph, to be modified without affecting the caller.
*pSentenceCount = 0;
*pMaxWords = 0;
char* sentence = strtok(input, delims);
while(sentence)
{
(*pSentenceCount)++;
int wordCount;
int longestWord;
GetSentenceStats(sentence, &wordCount, &longestWord);
*pMaxWords = MAX(*pMaxWords, wordCount);
sentence = strtok(NULL, delims); // This line returns garbage data,
}
free(input);
}
此功能看起来也非常简单明了。结果
的但它不工作,由该示例程序作为证明。的
This function also looks very simple and straightforward.
But it does not work, as demonstrated by this sample program.
int main(void)
{
int cnt;
int len;
// First demonstrate that the SentenceStats function works properly:
char *sentence = "There were a king with a large jaw and a queen with a plain face, on the throne of England.";
GetSentenceStats(sentence, &cnt, &len);
printf("Word Count: %d\nLongest Word: %d\n", cnt, len);
// Correct Answer:
// Word Count: 20
// Longest Word: 7 ("England")
printf("\n\nAt this point, expected output is 20/7.\nEverything is working fine\n\n");
char paragraph[] = "It was the best of times!" // Literary purists will note I have changed Dicken's original text to make a better example
"It was the worst of times?"
"It was the age of wisdom."
"It was the age of foolishness."
"We were all going direct to Heaven!";
int sentenceCount;
int maxWords;
GetParagraphStats(paragraph, &sentenceCount, &maxWords);
printf("Sentence Count: %d\nLongest Sentence: %d\n", sentenceCount, maxWords);
// Correct Answer:
// Sentence Count: 5
// Longest Sentence: 7 ("We were all going direct to Heaven")
printf("\n\nAt the end, expected output is 5/7.\nBut Actual Output is Undefined Behavior! Strtok is hopelessly broken\n");
_getch();
return 0;
}
所有来电 strtok的
是完全正确的,并通过独立的数据。结果的但结果是未定义行为!的
All calls to strtok
are entirely correct, and are on separate data.
But the result is Undefined Behavior!
为什么会出现这种情况?结果 Why does this happen? 但 But 何时可以安全使用的strtok?结果 When is it safe to use strtok? 使用功能 The function using 使用功能 The function using 在多线程环境中,使用 In a multi-threaded environment, use of 这篇关于是的strtok坏了吗?或者只是猫腻?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
当 GetParagraphStats
被调用,它开始了一个 strtok的
-loop得到句子。
在第一个句子,它会调用 GetSentenceStats
。 GetSentenceStats
也将作为一个 strtok的
-loop,失去了通过 GetParagraphStats $建立所有国家C $ C>。
当 GetSentenceStats
返回后,调用者( GetParagraphStats
)将调用的strtok(NULL)
再次得到下一个句子。
When GetParagraphStats
is called, it begins a strtok
-loop to get sentences.
On the first sentence it will call GetSentenceStats
. GetSentenceStats
will also being a strtok
-loop, losing all state established by GetParagraphStats
.
When GetSentenceStats
returns, the caller (GetParagraphStats
) will call strtok(NULL)
again to get the next sentence. strtok的
将认为的,这是一个呼吁继续previous操作,并且将继续是现在已经被释放的记号化存储!
其结果是可怕未定义的行为。strtok
will think this is a call to continue the previous operation, and will continue tokenizing memory that has now been freed!
The result is the dreaded Undefined Behavior.
的即使在单线程环境的 strtok的
可以仅被安全地当程序员/架构师肯定两个使用条件:
Even in a single-threaded environment, strtok
can only be used safely when the programmer/architect is sure of two conditions: strtok的
绝不能调用也可以使用strtok的任何功能。结果如果调用也使用strtok的子程序,自己使用strtok的的可能会中断。
strtok
must never call any function that may also use strtok.
If it calls a subroutine that also uses strtok, its own use of strtok may be interrupted. strtok的
绝不可通过也可以使用任何的strtok函数调用。照片如果这个函数曾经被另一个程序使用所谓的strtok,那么这个函数将中断使用呼叫者的strtok函数strtok
must never be called by any function that may also use strtok.
If this function ever called by another routine using strtok, then this function will interrupt the callers use of strtok. strtok的
更是不可能的,因为程序员需要确保只有一个使用的strtok
在当前线程上,而且,没有其他的线程使用 strtok的
或者strtok
is even more impossible, because the programmer needs to be sure that there is only one use of strtok
on the current thread, and also, no other threads are using strtok
either.