是的strtok坏了吗?或者只是猫腻? [英] Is strtok broken? Or just tricky?

查看:191
本文介绍了是的strtok坏了吗?或者只是猫腻?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

strtok的无可救药的坏了?

在关于C文本的解析StackOverflow的许多疑问,有人会建议使用 strtok的
和一个共同的答复是, strtok的不应该被使用,它是无可救药的坏了。

On many StackOverflow questions about text-parsing in C, someone will suggest using strtok, and one common reply is that strtok should never be used, that it is hopelessly broken.

一些海报声称 strtok的的问题仅限于多线程问题,它是在单线程环境是安全的。

Some posters have claimed that strtok's problems are limited to multi-threading issues, and it is safe in a single-threaded environment.

什么是正确的答案?结果
它的工作原理?结果
它是无可救药的坏了?结果
你能你的答案举例备份?

What is the right answer?
Does it work?
Is it hopelessly broken?
Can you back up your answer with examples?

推荐答案

strtok的是无可救药的坏了, 即使在一个简单的单线程程序,然后我将演示此故障的一些示例code:

Yes, strtok is hopelessly broken, even in a simple single-threaded program, and I will demonstrate this failure with some sample code:

让我们用一个简单的文本分析功能开始收集有关文本的句子统计,使用 strtok的
这code会导致不确定的行为。

Let us begin with a simple text-analyzer function to gather statistics about sentences of text, using strtok. This code will lead to undefined behavior.

在这个例子中,一个句子是一组用空格,逗号,分号和句点分隔的单词。

In this example, a sentence is a set of words separated by spaces, commas, semi-colons, and periods.

// Example:
//     int words, longest;
//     GetSentenceStats("There were a king with a large jaw and a queen with a plain face, on the throne of England.", &words, &longest);
// will report there are 20 words, and the longest word has 7 characters ("England").
void GetSentenceStats(const char* sentence, int* pWordCount, int* pMaxWordLen)
{
    char* delims = " ,;.";           // In a sentence, words are separated by spaces, commas, semi-colons or period.
    char* input = strdup(sentence);  // Make an local copy of the sentence, to be modified without affecting the caller.

    *pWordCount = 0;                 // Initialize the output to Zero
    *pMaxWordLen = 0;

    char* word = strtok(input, delims);
    while(word)
    {
        (*pWordCount)++;
        *pMaxWordLen = MAX(*pMaxWordLen, (int)strlen(word));
        word = strtok(NULL, delims);
    }
    free(input);
}

这个简单的功能工作。有没有错误为止。

现在,让我们增强我们的图书馆补充说,收集有关文本段落的统计功能。结果
一个段落是一组用感叹号,问号和句号分隔句子。

Now let us augment our library to add a function that gathers stats on Paragraphs of text.
A paragraph is a set of sentences separated by Exclamation Marks, Question Marks and Periods.

它将返回在该段句子的数目,和在最长句子的字的数量。结果
也许最重要的是,它将使用较早的函数 GetSentenceStats 帮助

It will return the number of sentences in the paragraph, and the number of words in the longest sentence.
And perhaps most importantly, it will use the earlier function GetSentenceStats to help

void GetParagraphStats(const char* paragraph, int* pSentenceCount, int* pMaxWords)
{
    char* delims = ".!?";             // Sentences in a paragraph are separated by Period, Question-Mark, and Exclamation.
    char* input = strdup(paragraph);  // Make an local copy of the paragraph, to be modified without affecting the caller.

    *pSentenceCount = 0;
    *pMaxWords = 0;
    char* sentence = strtok(input, delims);
    while(sentence)
    {
        (*pSentenceCount)++;

        int wordCount;
        int longestWord;
        GetSentenceStats(sentence, &wordCount, &longestWord);
        *pMaxWords = MAX(*pMaxWords, wordCount);
        sentence = strtok(NULL, delims);    // This line returns garbage data, 
    }
    free(input);
}

此功能看起来也非常简单明了。结果
但它不工作,由该示例程序作为证明。

This function also looks very simple and straightforward.
But it does not work, as demonstrated by this sample program.

int main(void)
{
    int cnt;
    int len;

    // First demonstrate that the SentenceStats function works properly:
    char *sentence = "There were a king with a large jaw and a queen with a plain face, on the throne of England."; 
    GetSentenceStats(sentence, &cnt, &len);
    printf("Word Count: %d\nLongest Word: %d\n", cnt, len);
    // Correct Answer:
    // Word Count: 20
    // Longest Word: 7   ("England")


    printf("\n\nAt this point, expected output is 20/7.\nEverything is working fine\n\n");

    char paragraph[] =  "It was the best of times!"   // Literary purists will note I have changed Dicken's original text to make a better example
                        "It was the worst of times?"
                        "It was the age of wisdom."
                        "It was the age of foolishness."
                        "We were all going direct to Heaven!";
    int sentenceCount;
    int maxWords;
    GetParagraphStats(paragraph, &sentenceCount, &maxWords);
    printf("Sentence Count: %d\nLongest Sentence: %d\n", sentenceCount, maxWords);
    // Correct Answer:
    // Sentence Count: 5
    // Longest Sentence: 7  ("We were all going direct to Heaven")

    printf("\n\nAt the end, expected output is 5/7.\nBut Actual Output is Undefined Behavior! Strtok is hopelessly broken\n");
    _getch();
    return 0;
}

所有来电 strtok的是完全正确的,并通过独立的数据。结果但结果是未定义行为!

All calls to strtok are entirely correct, and are on separate data.
But the result is Undefined Behavior!

为什么会出现这种情况?结果
GetParagraphStats 被调用,它开始了一个 strtok的 -loop得到句子。
在第一个句子,它会调用 GetSentenceStats GetSentenceStats 也将作为一个 strtok的 -loop,失去了通过 GetParagraphStats
GetSentenceStats 返回后,调用者( GetParagraphStats )将调用的strtok(NULL)再次得到下一个句子。

Why does this happen?
When GetParagraphStats is called, it begins a strtok-loop to get sentences. On the first sentence it will call GetSentenceStats. GetSentenceStats will also being a strtok-loop, losing all state established by GetParagraphStats. When GetSentenceStats returns, the caller (GetParagraphStats) will call strtok(NULL) again to get the next sentence.

strtok的认为的,这是一个呼吁继续previous操作,并且将继续是现在已经被释放的记号化存储!
其结果是可怕未定义的行为。

But strtok will think this is a call to continue the previous operation, and will continue tokenizing memory that has now been freed! The result is the dreaded Undefined Behavior.

何时可以安全使用的strtok?结果
即使在单线程环境 strtok的可以被安全地当程序员/架构师肯定两个使用条件:

When is it safe to use strtok?
Even in a single-threaded environment, strtok can only be used safely when the programmer/architect is sure of two conditions:


  • 使用功能 strtok的绝不能调用也可以使用strtok的任何功能。结果如果调用也使用strtok的子程序,自己使用strtok的的可能会中断。

  • The function using strtok must never call any function that may also use strtok.
    If it calls a subroutine that also uses strtok, its own use of strtok may be interrupted.

使用功能 strtok的绝不可通过也可以使用任何的strtok函数调用。照片如果这个函数曾经被另一个程序使用所谓的strtok,那么这个函数将中断使用呼叫者的strtok函数

The function using strtok must never be called by any function that may also use strtok.
If this function ever called by another routine using strtok, then this function will interrupt the callers use of strtok.

在多线程环境中,使用 strtok的更是不可能的,因为程序员需要确保只有一个使用的strtok 在当前线程上,而且,没有其他的线程使用 strtok的或者

In a multi-threaded environment, use of strtok is even more impossible, because the programmer needs to be sure that there is only one use of strtok on the current thread, and also, no other threads are using strtok either.

这篇关于是的strtok坏了吗?或者只是猫腻?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆