如何从在C文本文件解析标记之间的数据 [英] How to parse data between tags from a text file in C

查看:172
本文介绍了如何从在C文本文件解析标记之间的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用C从文本文件打印标签之间的数据。

输入的语句:
(人)马克·扎克伯格(/人)从(位置)美国(/位置)一个恩特雷里奥斯preneur。他也是(组织)的Facebook(/组织)的首席执行官。

输出:马克·扎克伯格的Facebook美国

我的计划code是:

 为const char * getfield命令(字符*线,INT NUM)
    {
        为const char * TOK;
        对于(TOK = strtok的(行,/>中);
                TOK&功放;&安培; * TOK;
                TOK =的strtok(NULL,< \\ n))
        {
            如果(! - NUM)
                返回TOK;
        }
        返回NULL;
    }    诠释的main()
    {
        焦线[500000]。
        而(与fgets(行,500000标准输入))
        {
            字符* TMP =的strdup(线);
            的printf(%S \\ n,getfield命令(TMP,2));
            免费(TMP);
        }
    }

它只是印刷马克·扎克伯格。标签之间的其他数据没有显示?是否有人可以帮助我哪里错了?我刚开始用C学习文件处理,这样的指导是非常AP preciated。谢谢你。

编辑:请更换(由<和)由/>\".


解决方案

getfield命令不要做你想做我猜。在例如字符串(remplacing括号)你的循环开始 strtok的将削减在第一个>( strtok的使用任何字符作为分隔符),所以第一个人一前一后的。之后,你只削减> \\ N所以在这个标签的结束。有了足够大的 NUM ,将给予(内循环):

 < PERSON
 马克·扎克伯格
/人>是从恩特雷里奥斯preneur
地点>美国
/地点&gt ;.他也是公司的CEO
组织> Facebook的
/组织与GT;

您应交替搜索:搜索结束标记(>),然后搜索打开的标签(小于):在beetween是第一个标签的内容。然后跳过结束标记并重新开始一样,直到结束。是这样的:

 的char * GF(字符*线,INT NUM){
  字符* N1,N2 *;
  //注释是第一循环
  //第一个标签的搜索结束(开口)
  N1 = strtok的(行,> \\ n);
  而(N1){
    //第二个标签的搜索开始(correp。关闭)
    N2 = strtok的(NULL,&下;);
    //这个人是不错的,我们还要回来呢?
    如果(NUM == 0){
      返回(N 2);
    }
    的printf(发现:%S \\ n,N2);
    //搜索结束第二个标签(已跳过)
    N1 =的strtok(NULL,> \\ n);
    //第三标签(开口)的搜索结束,然后循环(同样的情况)
    N1 =的strtok(NULL,> \\ n);
  }
  返回NULL;
}

请注意,这code是不是很漂亮。如果你有>或<普通文本中会出错(如做你自己的code,BTW)。如果字符串不以\\ n结束它不正确停止。

注二:无论你需要一个可靠的方法,你将不得不的阅读的标签。我的意思是找到一个标签(东西beetween&LT;和>),然后找到相应的结束标签(相同,但与/和相同的内容),然后只拿到里面的文字或产生错误<。 / p>

编辑:我改变了功能,使其返回 NUM 个元素。现在,您将不得不面对一个的main()函数能够调用这个函数几次, NUM 存储(或打印)的结果,直到得到NULL的答案。
随着家庭工作,你必须找到如何管理为主线(线),这样连续调用是可能的(否则你会真的只得到第一个标签): )

I want to print the data between tags from a text file using C.

Input statement : (PERSON) Mark Zuckerberg (/PERSON) is a entrepreneur from (LOCATION) USA (/LOCATION). He is also the CEO of (ORGANIZATION) Facebook (/ORGANIZATION).

Output: Mark Zuckerberg USA Facebook.

My Program code is :

    const char* getfield(char* line, int num)
    {
        const char* tok;
        for (tok = strtok(line, "/>");
                tok && *tok;
                tok = strtok(NULL, "<\n"))
        {
            if (!--num)
                return tok;
        }
        return NULL;
    }

    int main()
    {
        char line[500000];
        while (fgets(line, 500000, stdin))
        {
            char* tmp = strdup(line);
            printf(" %s\n", getfield(tmp, 2));
            free(tmp);
        }
    }

It is only printing Mark Zuckerberg. Other data between tags are not showing ? Can someone please help where I went wrong ? I have just started learning file processing in C, so guidance is highly appreciated. Thanks.

EDIT: Please replace "(" by "<" and ")" by "/>".

解决方案

Your getfield don't do what you want I guess. On example string (remplacing parenthesis) your for loop starts strtok will cut at the first ">" (strtok uses any of the characters as delimiter) so the one after the 1st "PERSON". After that you only cut for ">\n" so at end of this tag. With a big enough num it would give (inside the loop):

<PERSON
 Mark Zuckerberg 
/PERSON> is a entrepreneur from 
LOCATION> USA 
/LOCATION>. He is also the CEO of 
ORGANIZATION> Facebook 
/ORGANIZATION>

You should alternate searches: search for closing tag (>), then search for opening tag (<): in beetween is the content of 1st tag. Then skip the closing tag and start again the same until the end. Something like:

char *gf(char *line, int num) {
  char *n1, *n2;
  // comments are for the 1st loop
  // search end of 1st tag (opening)
  n1 = strtok(line, ">\n");
  while(n1) {
    // search begin of 2nd tag (correp. closing)
    n2 = strtok(NULL, "<");
    // this one is good, shall we return it?
    if (num == 0) {
      return(n2);
    }
    printf("Found: %s\n", n2);
    // search end 2nd tag (have to skip it)
    n1 = strtok(NULL, ">\n");
    // search end of 3rd tag (opening), then loop (same situation)
    n1 = strtok(NULL, ">\n");
  }
  return NULL;
}

Note that this code is not very nice. If you have ">" or "<" inside regular text it will go wrong (as do you own code, BTW). And it don't stop properly if string don't ends with a \n.

Note bis: whatever if you need a robust approach you will have to read tags. I mean find a tag (stuff beetween "<" and ">"), then find the corresponding closing tag (same, but with the / and the same content), and then only get the text inside or generate an error.

EDIT: I changed the function so that it returns the numth element. You will now have to deal with a main() function able to call this function several times, with increasing values of num, storing (or printing) the result until getting NULL answer. As home work you will have to find how to manage the main string (line) in main so that successive calls are possible (else you will really only get the first tag) :)

这篇关于如何从在C文本文件解析标记之间的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆