在C中使用多字符定界符分割字符字符串 [英] split char string with multi-character delimiter in C

查看:225
本文介绍了在C中使用多字符定界符分割字符字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于多字符定界符分割char *string.我知道strtok()用于拆分字符串,但是它可以与单个字符定界符一起使用.

I want to split a char *string based on multiple-character delimiter. I know that strtok() is used to split a string but it works with single character delimiter.

我想基于子字符串(例如"abc"或任何其他子字符串)分割char * string.如何实现?

I want to split char *string based on a substring such as "abc" or any other sub-string. How that can be achieved?

推荐答案

找到所需序列发生的点非常容易:strstr支持:

Finding the point at which the desired sequence occurs is pretty easy: strstr supports that:

char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");

因此,此时pos指向较大字符串中abc的第一个位置.这是丑陋的地方. strtok具有讨厌的设计,其中1)修改原始字符串,2)在内部存储指向该字符串中当前"位置的指针.

So, at that point, pos points to the first location of abc in the larger string. Here's where things get a little ugly. strtok has a nasty design where it 1) modifies the original string, and 2) stores a pointer to the "current" location in the string internally.

如果我们不介意进行大致相同的操作,则可以执行以下操作:

If we didn't mind doing roughly the same, we could do something like this:

char *multi_tok(char *input, char *delimiter) {
    static char *string;
    if (input != NULL)
        string = input;

    if (string == NULL)
        return string;

    char *end = strstr(string, delimiter);
    if (end == NULL) {
        char *temp = string;
        string = NULL;
        return temp;
    }

    char *temp = string;

    *end = '\0';
    string = end + strlen(delimiter);
    return temp;
}

这确实有效.例如:

int main() {
    char input [] = "this is abc a big abc input string abc to split up";

    char *token = multi_tok(input, "abc");

    while (token != NULL) {
        printf("%s\n", token);
        token = multi_tok(NULL, "abc");
    }
}

大致产生预期的输出:

this is
 a big
 input string
 to split up

尽管如此,它还是笨拙的,难以使线程安全的(您必须使其内部的string变量成为线程局部的),并且通常只是笨拙的设计.使用(例如)一个类似于strtok_r的接口,我们至少可以解决线程安全问题:

Nonetheless, it's clumsy, difficult to make thread-safe (you have to make its internal string variable thread-local) and generally just a crappy design. Using (for one example) an interface something like strtok_r, we can fix at least the thread-safety issue:

typedef char *multi_tok_t;

char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
    if (input != NULL)
        *string = input;

    if (*string == NULL)
        return *string;

    char *end = strstr(*string, delimiter);
    if (end == NULL) {
        char *temp = *string;
        *string = NULL;
        return temp;
    }

    char *temp = *string;

    *end = '\0';
    *string = end + strlen(delimiter);
    return temp;
}

multi_tok_t init() { return NULL; }

int main() {
    multi_tok_t s=init();

    char input [] = "this is abc a big abc input string abc to split up";

    char *token = multi_tok(input, &s, "abc");

    while (token != NULL) {
        printf("%s\n", token);
        token = multi_tok(NULL, &s, "abc");
    }
}

我想我现在就保留它-为了获得一个真正干净的界面,我们真的很想重新发明像协程这样的东西,这可能在这里发布了很多.

I guess I'll leave it at that for now though--to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit much to post here.

这篇关于在C中使用多字符定界符分割字符字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆